Audio signal representations for indexing in the transform domain - Institut Langevin Accéder directement au contenu
Article Dans Une Revue IEEE Transactions on Audio, Speech and Language Processing Année : 2010

Audio signal representations for indexing in the transform domain

Résumé

Indexing audio signals directly in the transform domain can potentially save a significant amount of computation when working on a large database of signals stored in a lossy compression format, without having to fully decode the signals. Here, we show that the representations used in standard transform-based audio codecs (e.g. MDCT for AAC, or hybrid PQF/MDCT for MP3) have a sufficient time resolution for some rhythmic features, but a poor frequency resolution, which prevents their use in tonality-related applications. Alternatively, a recently developed audio codec based on a sparse multi-scale MDCT transform has a good resolution both for time-and frequency-domain features. We show that this new audio codec allows efficient transform-domain audio indexing for 3 different applications, namely beat tracking, chord recognition and musical genre classification. We compare results obtained with this new audio codec and the two standard MP3 and AAC codecs, in terms of performance and computation time.
Fichier principal
Vignette du fichier
TSALP_ravelli10.pdf (495.68 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02652798 , version 1 (29-05-2020)

Identifiants

  • HAL Id : hal-02652798 , version 1

Citer

Emmanuel Ravelli, Gael Richard, Laurent Daudet. Audio signal representations for indexing in the transform domain. IEEE Transactions on Audio, Speech and Language Processing, 2010. ⟨hal-02652798⟩
51 Consultations
152 Téléchargements

Partager

Gmail Facebook X LinkedIn More