Optimizing Multi-Taper Features for Deep Speaker Verification - Laboratoire Lorrain de Recherche en Informatique et ses Applications Accéder directement au contenu
Article Dans Une Revue IEEE Signal Processing Letters Année : 2021

Optimizing Multi-Taper Features for Deep Speaker Verification

Résumé

Multi-taper estimators provide low-variance power spectrum estimates that can be used in place of the windowed discrete Fourier transform (DFT) to extract speech features such as mel-frequency cepstral coefficients (MFCCs). Even if past work has reported promising automatic speaker verification (ASV) results with Gaussian mixture model-based classifiers, the performance of multi-taper MFCCs with deep ASV systems remains an open question. Instead of a static-taper design, we propose to optimize the multi-taper estimator jointly with a deep neural network trained for ASV tasks. With a maximum improvement on the SITW corpus of 25.8% in terms of equal error rate over the static-taper, our method helps preserve a balanced level of leakage and variance, providing more robustness.
Fichier principal
Vignette du fichier
Multitaper_IEEE_SPL.pdf (286.15 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03394152 , version 1 (22-10-2021)

Identifiants

Citer

Xuechen Liu, Md Sahidullah, Tomi Kinnunen. Optimizing Multi-Taper Features for Deep Speaker Verification. IEEE Signal Processing Letters, 2021, 28, pp.2187 - 2191. ⟨10.1109/LSP.2021.3122796⟩. ⟨hal-03394152⟩
36 Consultations
77 Téléchargements

Altmetric

Partager

Gmail Facebook X LinkedIn More