Feature Normalisation for Robust Speech Recognition

Speech recognition system performance degrades in noisy environments. If the acoustic models are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This gives poor likelihoods and poor recognition accuracy. Model...

Full description

Saved in:

Bibliographic Details
Main Author:	Kumar, D. S. Pavan
Format:	Journal Article
Language:	English
Published:	14-07-2015
Subjects:	Computer Science - Computation and Language Computer Science - Sound
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Abstract	Speech recognition system performance degrades in noisy environments. If the acoustic models are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This gives poor likelihoods and poor recognition accuracy. Model adaptation and feature normalisation are two broad areas that address this problem. While the former often gives better performance, the latter involves estimation of lesser number of parameters, making the system feasible for practical implementations. This research focuses on the efficacies of various subspace, statistical and stereo based feature normalisation techniques. A subspace projection based method has been investigated as a standalone and adjunct technique involving reconstruction of noisy speech features from a precomputed set of clean speech building-blocks. The building blocks are learned using non-negative matrix factorisation (NMF) on log-Mel filter bank coefficients, which form a basis for the clean speech subspace. The work provides a detailed study on how the method can be incorporated into the extraction process of Mel-frequency cepstral coefficients. Experimental results show that the new features are robust to noise, and achieve better results when combined with the existing techniques. The work also proposes a modification to the training process of SPLICE algorithm for noise robust speech recognition. It is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. An MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed.
AbstractList	Speech recognition system performance degrades in noisy environments. If the acoustic models are built using features of clean utterances, the features of a noisy test utterance would be acoustically mismatched with the trained model. This gives poor likelihoods and poor recognition accuracy. Model adaptation and feature normalisation are two broad areas that address this problem. While the former often gives better performance, the latter involves estimation of lesser number of parameters, making the system feasible for practical implementations. This research focuses on the efficacies of various subspace, statistical and stereo based feature normalisation techniques. A subspace projection based method has been investigated as a standalone and adjunct technique involving reconstruction of noisy speech features from a precomputed set of clean speech building-blocks. The building blocks are learned using non-negative matrix factorisation (NMF) on log-Mel filter bank coefficients, which form a basis for the clean speech subspace. The work provides a detailed study on how the method can be incorporated into the extraction process of Mel-frequency cepstral coefficients. Experimental results show that the new features are robust to noise, and achieve better results when combined with the existing techniques. The work also proposes a modification to the training process of SPLICE algorithm for noise robust speech recognition. It is based on feature correlations, and enables this stereo-based algorithm to improve the performance in all noise conditions, especially in unseen cases. Further, the modified framework is extended to work for non-stereo datasets where clean and noisy training utterances, but not stereo counterparts, are required. An MLLR-based computationally efficient run-time noise adaptation method in SPLICE framework has been proposed.
Author	Kumar, D. S. Pavan
Author_xml	– sequence: 1 givenname: D. S. Pavan surname: Kumar fullname: Kumar, D. S. Pavan
BackLink	https://doi.org/10.48550/arXiv.1507.04019$$DView paper in arXiv
BookMark	eNotzktuwjAUhWEP2gGlXQCjegNJ7Th-DSsEBQm1Uso8unauW0sQIyegdveIx-gMfunoeyIPfeqRkBlnZW2kZG-Q_-Kp5JLpktWM2wkRS4TxmJF-pryHXRxgjKmnIWXaJHccRvp9QPS_tEGffvp4qc_kMcBuwJf7Tsl2udjOV8Xm62M9f98UoLQtOlsrb0QQPHBdSa6ChNp3TnuGXBqhnXUClXGVs5U2VlnUGoMTCqTpeCem5PV2e1W3hxz3kP_bi7696sUZqmxA0g
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by-sa/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by-sa/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.1507.04019
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	1507_04019
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a679-d946c83f31f172516f5a4cdb7c0e15837b9b3e68b2b9278969e77efb36a58d1d3
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:43:57 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a679-d946c83f31f172516f5a4cdb7c0e15837b9b3e68b2b9278969e77efb36a58d1d3
OpenAccessLink	https://arxiv.org/abs/1507.04019
ParticipantIDs	arxiv_primary_1507_04019
PublicationCentury	2000
PublicationDate	2015-07-14
PublicationDateYYYYMMDD	2015-07-14
PublicationDate_xml	– month: 07 year: 2015 text: 2015-07-14 day: 14
PublicationDecade	2010
PublicationYear	2015
Score	1.6084414
SecondaryResourceType	preprint
Snippet	Speech recognition system performance degrades in noisy environments. If the acoustic models are built using features of clean utterances, the features of a...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Computation and Language Computer Science - Sound
Title	Feature Normalisation for Robust Speech Recognition
URI	https://arxiv.org/abs/1507.04019
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://sdu.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwELVoJxYEAlQ-5YE1gL_jEUFLJ5DaDt0in30RSKitkgb152MnqWBhtW8529J757t7R8hdDJJlBAaW6vxEJpWDlN_1GSgJ3BpXBpGak6dz87bMX8ZJJofue2Fctfv87vSBoX5IbOU-PrOk6zngPJVsvb4vu-RkK8XV2__aRY7ZLv0BickxOerZHX3qruOEHODqlIhEs5oKaTvA_quvn6GRLdLZGpp6S-cbRP9BZ_tinvXqjCwm48XzNOtnFWROG5sFK7XPRSlYGRmBYrpUTvoAxj8iUzEIBAsCdQ4cbOo91RaNwRKEdioPLIhzMozhPo4IVXkO2npvICI9Q-dQAkMOkbkAF0ZdkFHrYbHp5CiK5HzROn_5_9YVOYxQr9KvJJPXZLitGrwhgzo0t-2Z_gA07HPk
link.rule.ids	228,230,782,887
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Feature+Normalisation+for+Robust+Speech+Recognition&rft.au=Kumar%2C+D.+S.+Pavan&rft.date=2015-07-14&rft_id=info:doi/10.48550%2Farxiv.1507.04019&rft.externalDocID=1507_04019