When benchmarking audio search algorithms, developers typically use a variety of datasets that are designed to provide diverse audio samples and relevant labels. Some of the most common datasets include the Million Song Dataset, AudioSet, and the Free Music Archive (FMA). The Million Song Dataset consists of metadata and audio features from a million contemporary songs, making it useful for testing algorithms that focus on musical content. AudioSet, created by Google, is a large-scale dataset with over 2 million human-labeled 10-second audio clips spanning different sound categories, including music, environmental sounds, and other audio-visual content. This dataset is beneficial for evaluating audio classification and search systems across a wide range of sounds.
Another dataset frequently used is the Free Music Archive (FMA), which contains a collection of freely accessible music tracks. The FMA offers various subsets, such as the small, medium, and large datasets, catering to different benchmarking needs. The dataset includes information on genre, artist, and track details, making it useful for developing algorithms that require semantic understanding of music. This variety allows developers to simulate real-world scenarios for retrieving music based on genre classification or similarity searching.
In addition to these, developers might also look at datasets like the ESC-50 or UrbanSound, which are useful for specific audio environments. The ESC-50 dataset includes 2,000 environmental audio recordings organized into 50 categories, while UrbanSound focuses on urban sounds, including sirens and public transport noises. These datasets can help assess how well audio search algorithms can identify and retrieve specific sound types in varied contexts. Selecting a dataset ultimately depends on the intended application and the specific features of the audio search algorithm being benchmarked.