Search Results - "Klejch, Ondrej"
-
1
Adaptation Algorithms for Neural Network-Based Speech Recognition: An Overview
Published in IEEE open journal of signal processing (2021)“…We present a structured overview of adaptation algorithms for neural network-based speech recognition, considering both hybrid hidden Markov model / neural…”
Get full text
Journal Article -
2
Sequence-to-sequence models for punctuated transcription combining lexical and acoustic features
Published in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-03-2017)“…In this paper we present an extension of our previously described neural machine translation based system for punctuated transcription. This extension allows…”
Get full text
Conference Proceeding -
3
Punctuated transcription of multi-genre broadcasts using acoustic and lexical approaches
Published in 2016 IEEE Spoken Language Technology Workshop (SLT) (01-12-2016)“…In this paper we investigate the punctuated transcription of multi-genre broadcast media. We examine four systems, three of which are based on lexical…”
Get full text
Conference Proceeding -
4
Ava Active Speaker: An Audio-Visual Dataset for Active Speaker Detection
Published in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-05-2020)“…Active speaker detection is an important component in video analysis algorithms for applications such as speaker diarization, video re-targeting for meetings,…”
Get full text
Conference Proceeding -
5
Towards Zero-Shot Code-Switched Speech Recognition
Published in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (04-06-2023)“…In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot set-ting where no transcribed CS…”
Get full text
Conference Proceeding -
6
The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR
Published in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (04-06-2023)“…English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a…”
Get full text
Conference Proceeding -
7
Efficient Intelligibility Evaluation Using Keyword Spotting: A Study on Audio-Visual Speech Enhancement
Published in ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (04-06-2023)“…We propose a new method for human speech intelligibility evaluation based on keyword spotting. In this method, participants play a stimulus and select the word…”
Get full text
Conference Proceeding -
8
Speech Collage: Code-Switched Audio Generation by Collaging Monolingual Corpora
Published in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (14-04-2024)“…Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To…”
Get full text
Conference Proceeding -
9
Beyond Oversmoothing: Evaluating DDPM and MSE for Scalable Speech Synthesis in ASR
Published 16-10-2024“…Synthetically generated speech has rapidly approached human levels of naturalness. However, the paradox remains that ASR systems, when trained on TTS output…”
Get full text
Journal Article -
10
Exploring Dominant Paths in CTC-Like ASR Models: Unraveling the Effectiveness of Viterbi Decoding
Published in 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW) (14-04-2024)“…Connectionist Temporal Classification (CTC) has emerged as a fundamental technique in Automatic Speech Recognition (ASR), renowned for its ability to…”
Get full text
Conference Proceeding -
11
Speaker Adaptive Training Using Model Agnostic Meta-Learning
Published in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (01-12-2019)“…Speaker adaptive training (SAT) of neural network acoustic models learns models in a way that makes them more suitable for adaptation to test conditions…”
Get full text
Conference Proceeding -
12
Deciphering Speech: a Zero-Resource Approach to Cross-Lingual Transfer in ASR
Published 12-11-2021“…We present a method for cross-lingual training an ASR system using absolutely no transcribed training data from the target language, and with no phonetic…”
Get full text
Journal Article -
13
Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling
Published 03-06-2023“…Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using…”
Get full text
Journal Article -
14
ASR and Emotional Speech: A Word-Level Investigation of the Mutual Impact of Speech and Emotion Recognition
Published 25-05-2023“…In Speech Emotion Recognition (SER), textual data is often used alongside audio signals to address their inherent variability. However, the reliance on human…”
Get full text
Journal Article -
15
Towards Zero-Shot Code-Switched Speech Recognition
Published 02-11-2022“…In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS…”
Get full text
Journal Article -
16
Acoustic Model Adaptation from Raw Waveforms with Sincnet
Published in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (01-12-2019)“…Raw waveform acoustic modelling has recently gained interest due to neural networks' ability to learn feature extraction, and the potential for finding better…”
Get full text
Conference Proceeding -
17
AVSE Challenge: Audio-Visual Speech Enhancement Challenge
Published in 2022 IEEE Spoken Language Technology Workshop (SLT) (09-01-2023)“…Audio-visual speech enhancement is the task of improving the quality of a speech signal when video of the speaker is available. It opens-up the opportunity of…”
Get full text
Conference Proceeding -
18
The Edinburgh International Accents of English Corpus: Towards the Democratization of English ASR
Published 31-03-2023“…English is the most widely spoken language in the world, used daily by millions of people as a first or second language in many different contexts. As a…”
Get full text
Journal Article -
19
Hierarchical recurrent neural network for story segmentation using fusion of lexical and acoustic features
Published in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (01-12-2017)“…A broadcast news stream consists of a number of stories and it is an important task to find the boundaries of stories automatically in news analysis. We…”
Get full text
Conference Proceeding -
20
Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models
Published 27-06-2019“…Acoustic model adaptation to unseen test recordings aims to reduce the mismatch between training and testing conditions. Most adaptation schemes for neural…”
Get full text
Journal Article