Status and future plans
animal2vec
Max Planck Institute of Animal Behavior
Department for the Ecology of Animal Societies
Communication and Collective Movement (CoCoMo) Group
Current status
Meerkats
- MeerKAT is published at Max-Planck repository (1068h pretraining | 184h finetuning)
- animal2vec manuscript was rejected by Nature Methods, still under review at Methods in Ecology and Evolution
- By now, we have a new animal2vec finetune checkpoint using additional data from the 2022 campaign (Using a different audio recorder -> Soroka | Adds 61h for finetuning)
Current status
Coatis
- Preliminary results using the small model are available for Galaxy group
(300h pretraining | 52h finetuning)
- Large pretraining using the large model and all groups is currently running
(1805h pretraining | 92h finetuning)
Current status
Hyenas
- Results using the small model are available
(3900h pretraining | 11h finetuning)
Future plans
The idea
- Currently, we train (pretrain + finetune) new models for every species
- Computationally very expensive
- Pretraining takes a month. Finetuning, a couple of days
- What is usually done in industry is to do an extremely large-scale pretraining and then finetune with different and smaller datasets
- The broadly pretrained model is then called foundational model
- Bioacoustic researchers, regardless of the species they study, could start from such a foundational model and finetune on the data they have
Future plans
Why don't we have that already?
Sparsity: Animal vocalizations are short and rare events
- For example:
- MeerKAT [1]: 184h of labeled audio, of which 7.8h (4.2%) contain meerkat vocalizations
- BirdVox-full-night [2]: 4.5M clips, each of duration 150 ms, only 35k (0.7%) are positive
- Hainan gibbon calls [3]: 256h of fully labeled PAM data with 1246 few-seconds events (0.01%)
- Marine datasets have even higher reported sparsity levels [4]
It is very hard to train a foundational model having these levels of sparsity
[1] Schäfer-Zimmermann, J. C., et al. (2024). Preprint at arXiv:2406.01253
[2] Lostanlen, V., et al. (2018) IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[3] Dufourq, E., et al. (2021) Remote Sensing in Ecology and Conservation, 7(3), 475-487.
[4] Allen, A. N., et al. (2021) Frontiers in Marine Science, 8, 607321
Status and roadmap for animal2vec 2.0
- Codewise: Full rewrite of the codebase almost done
- Hopefully brings huge improvements for train and inference speed
- Will include the sample-rate dependent positional encoding approach
- Various other technical improvements and novelties (another talk)
- Datawise: We already have a very large amount of bioacoustic data
- Everything from Xeno-Canto, iNaturalist, Tierstimmenarchiv (1.3M files -> 13k hours)
- This covers 96% of all known bird-species
- A large subset of Googles Audioset dataset (600 hours)
- Animal and Environmental sounds, plus various misc. stuff (noise, soundscapes, motor sounds, ...)
- 10-15k hours of marine data (Sanctuary Soundscape Monitoring Project (SanctSound), The William A. Watkins Collection, The Orchive, ...)
- We still have to take stock how much MPIAB data we use, but the expected order is ~10k hours
Summary
- We now have predictions for all CCAS species, and we are still working towards making them even better
- Meerkats: Current predictions are the best we can get with animal2vec
- Coati: Preliminary results are good. The best ones will be ready in 3-4 weeks
- Hyena: Pretraining status is good. Finetuning results can be improved with more labels, but are good
- We have an idea on how to tackle the domain-specific challenges in bioacoustics to build the next generation of animal2vec
- New codebase: Almost ready; Marius project will take another 1-2 months, then paper writing
- Dataset aggregation: Expected timeline is 1-2 months until everything is rudimentarily curated
- Computational resources: Will probably be provided by the Max Planck Computing and Data Facility
- Pretraining will take - in the best case - several months
Status and future plans
animal2vec
Max Planck Institute of Animal Behavior
Department for the Ecology of Animal Societies
Communication and Collective Movement (CoCoMo) Group