A new comprehensive and diverse few-shot acoustic classification benchmark.

CHeggan CHeggan Last update: Jan 04, 2024

MetaAudio-A-Few-Shot-Audio-Classification-Benchmark

test

Future Plans (Late 2023):

  • Release all pre-trained models for community use
  • Hyperparamater document

News

  • 10/01/23: New MetaAudio sets released in MT-SLVR Paper
  • 6/9/22: Presented MetaAudio at ICANN22, slides available in repo
  • 01/07/2022: MetaAudio accepted to ICANN22. To be presented in early September 2022.

Citation & Blog Breakdown

A new comprehensive and diverse few-shot acoustic classification benchmark. If you use any code or results from results from this work, please cite the following: ICANN22 Link or arXiv Link

@InProceedings{10.1007/978-3-031-15919-0_19,
author="Heggan, Calum
and Budgett, Sam
and Hospedales, Timothy
and Yaghoobi, Mehrdad",
editor="Pimenidis, Elias
and Angelov, Plamen
and Jayne, Chrisina
and Papaleonidas, Antonios
and Aydin, Mehmet",
title="MetaAudio: A Few-Shot Audio Classification Benchmark",
booktitle="Artificial Neural Networks and Machine Learning -- ICANN 2022",
year="2022",
publisher="Springer International Publishing",
address="Cham",
pages="219--230",
isbn="978-3-031-15919-0"
}

Licensing for work is Attribution-NonCommercial CC BY-NC

A new and (hopefully) more easily digestible blog of MetaAudio can be found here!

Enviroment

We use miniconda for our experimental setup. For the purposes of reproduction we include the environment file. This can be set up using the following command

conda env create --name metaaudio --file torch_gpu_env.txt

Contents Overview

This repo contains the following:

  • Multiple problem statement setups with accompanying results which can be used moving forward as baselines for few-shot acoustic classification. These include:
    • Normal within-dataset generalisation
    • Joint training to both within and cross-dataset settings
    • Additional data -> simple classifier for cross-dataset
    • Length shifted and stratified problems for variable length dataset setting
  • Standardised meta-learning/few-shot splits for 5 distinct datasets from a variety of sound domains. This includes both baseline (randomly generated splits) as well as some more unique and purposeful ones such as those based on available meta-data and sample length distributions
  • Variety of algorithm implementations designed to deal with few-shot classification, ranging from 'cheap' traditional training pipelines to SOTA Gradient-Based Meta-Learning (GBML) models
  • Both Fixed and Variable length dataset processing pipelines

Algorithm Implementations

Algorithms are custom built, operating on a similar framework with a common set of scripts. Those included in the paper are as follows:

For both MAML & Meta-Curvature we also make use of the Learn2Learn framework.

Datasets

We primarily cover 5 datasets for the majority of our experimentation, these are as follows:

In addition to these however, we also include 2 extra datasets for cross-dataset testing:

as well as a proprietary version of AudioSet we use for pre-training with simple classifiers. We obtained/scraped this dataset using the code from here:

We include sources for all of these datasets in Dataset Processing

Subscribe to our newsletter