Search Results - "Hank Liao"
-
1
Speaker adaptation of context dependent deep neural networks
Published in 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (01-05-2013)“…There has been little work on examining how deep neural networks may be adapted to speakers for improved speech recognition accuracy. Past work has examined…”
Get full text
Conference Proceeding -
2
A Comparison of End-to-End Models for Long-Form Speech Recognition
Published in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (01-12-2019)“…End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown…”
Get full text
Conference Proceeding -
3
Reducing the computational complexity for whole word models
Published in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (01-12-2017)“…In a previous study, we demonstrated the feasibility to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with…”
Get full text
Conference Proceeding -
4
Conformer is All You Need for Visual Speech Recognition
Published in Proceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) (14-04-2024)“…Visual speech recognition models extract visual features in a hierarchical manner. At the lower level, there is a visual front-end with a limited temporal…”
Get full text
Conference Proceeding -
5
RADMM: Recurrent Adaptive Mixture Model with Applications to Domain Robust Language Modeling
Published in 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-04-2018)“…We present a new architecture and a training strategy for an adaptive mixture of experts with applications to domain robust language modeling. The proposed…”
Get full text
Conference Proceeding -
6
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Published in ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-05-2020)“…Traditionally, audio-visual automatic speech recognition has been studied under the assumption that the speaking face on the visual signal is the face matching…”
Get full text
Conference Proceeding -
7
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
Published in ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (14-04-2024)“…We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model…”
Get full text
Conference Proceeding -
8
Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription
Published in 2013 IEEE Workshop on Automatic Speech Recognition and Understanding (01-12-2013)“…YouTube is a highly visited video sharing website where over one billion people watch six billion hours of video every month. Improving accessibility to these…”
Get full text
Conference Proceeding -
9
Recurrent Neural Network Transducer for Audio-Visual Speech Recognition
Published in 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (01-12-2019)“…This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture. To support the…”
Get full text
Conference Proceeding -
10
GMM-free DNN acoustic model training
Published in 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-05-2014)“…While deep neural networks (DNNs) have become the dominant acoustic model (AM) for speech recognition systems, they are still dependent on Gaussian mixture…”
Get full text
Conference Proceeding -
11
Exemplar-based large vocabulary speech recognition using k-nearest neighbors
Published in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-04-2015)“…This paper describes a large scale exemplar-based acoustic modeling approach for large vocabulary continuous speech recognition. We construct an index of…”
Get full text
Conference Proceeding -
12
On Robustness to Missing Video for Audiovisual Speech Recognition
Published 13-12-2023“…It has been shown that learning audiovisual features can lead to improved speech recognition performance over audio-only features, especially for noisy speech…”
Get full text
Journal Article -
13
Conformers are All You Need for Visual Speech Recognition
Published 16-02-2023“…Visual speech recognition models extract visual features in a hierarchical manner. At the lower level, there is a visual front-end with a limited temporal…”
Get full text
Journal Article -
14
Lattice rescoring strategies for long short term memory language models in speech recognition
Published in 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU) (01-12-2017)“…Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional…”
Get full text
Conference Proceeding -
15
End-to-End Multi-Person Audio/Visual Automatic Speech Recognition
Published 11-05-2022“…Traditionally, audio-visual automatic speech recognition has been studied under the assumption that the speaking face on the visual signal is the face matching…”
Get full text
Journal Article -
16
DiarizationLM: Speaker Diarization Post-Processing with Large Language Models
Published 07-01-2024“…In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system…”
Get full text
Journal Article -
17
Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network
Published 15-09-2023“…While standard speaker diarization attempts to answer the question "who spoken when", most of relevant applications in reality are more interested in…”
Get full text
Journal Article -
18
USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models
Published 14-09-2023“…We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model…”
Get full text
Journal Article -
19
Adversarial Training for Multilingual Acoustic Modeling
Published 17-06-2019“…Multilingual training has been shown to improve acoustic modeling performance by sharing and transferring knowledge in modeling different languages. Knowledge…”
Get full text
Journal Article -
20
Neural Language Modeling with Visual Features
Published 07-03-2019“…Multimodal language models attempt to incorporate non-linguistic features for the language modeling task. In this work, we extend a standard recurrent neural…”
Get full text
Journal Article