Search Results - "Liao, Hank"

Refine Results
  1. 1

    Speaker adaptation of context dependent deep neural networks by Hank Liao

    “…There has been little work on examining how deep neural networks may be adapted to speakers for improved speech recognition accuracy. Past work has examined…”
    Get full text
    Conference Proceeding
  2. 2

    A Comparison of End-to-End Models for Long-Form Speech Recognition by Chiu, Chung-Cheng, Han, Wei, Zhang, Yu, Pang, Ruoming, Kishchenko, Sergey, Nguyen, Patrick, Narayanan, Arun, Liao, Hank, Zhang, Shuyuan, Kannan, Anjuli, Prabhavalkar, Rohit, Chen, Zhifeng, Sainath, Tara, Wu, Yonghui

    “…End-to-end automatic speech recognition (ASR) models, including both attention-based models and the recurrent neural network transducer (RNN-T), have shown…”
    Get full text
    Conference Proceeding
  3. 3

    Reducing the computational complexity for whole word models by Soltau, Hagen, Hank Liao, Hasim Sak

    “…In a previous study, we demonstrated the feasibility to build a competitive, greatly simplified, large vocabulary continuous speech recognition system with…”
    Get full text
    Conference Proceeding
  4. 4

    Conformer is All You Need for Visual Speech Recognition by Chang, Oscar, Liao, Hank, Serdyuk, Dmitriy, Shah, Ankit, Siohan, Olivier

    “…Visual speech recognition models extract visual features in a hierarchical manner. At the lower level, there is a visual front-end with a limited temporal…”
    Get full text
    Conference Proceeding
  5. 5

    RADMM: Recurrent Adaptive Mixture Model with Applications to Domain Robust Language Modeling by Irie, Kazuki, Kumar, Shankar, Nirschl, Michael, Liao, Hank

    “…We present a new architecture and a training strategy for an adaptive mixture of experts with applications to domain robust language modeling. The proposed…”
    Get full text
    Conference Proceeding
  6. 6

    End-to-End Multi-Person Audio/Visual Automatic Speech Recognition by Braga, Otavio, Makino, Takaki, Siohan, Olivier, Liao, Hank

    “…Traditionally, audio-visual automatic speech recognition has been studied under the assumption that the speaking face on the visual signal is the face matching…”
    Get full text
    Conference Proceeding
  7. 7

    USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models by Zhao, Guanlong, Wang, Yongqiang, Pelecanos, Jason, Zhang, Yu, Liao, Hank, Huang, Yiling, Lu, Han, Wang, Quan

    “…We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model…”
    Get full text
    Conference Proceeding
  8. 8

    Large scale deep neural network acoustic modeling with semi-supervised training data for YouTube video transcription by Liao, Hank, McDermott, Erik, Senior, Andrew

    “…YouTube is a highly visited video sharing website where over one billion people watch six billion hours of video every month. Improving accessibility to these…”
    Get full text
    Conference Proceeding
  9. 9

    Recurrent Neural Network Transducer for Audio-Visual Speech Recognition by Makino, Takaki, Liao, Hank, Assael, Yannis, Shillingford, Brendan, Garcia, Basilio, Braga, Otavio, Siohan, Olivier

    “…This work presents a large-scale audio-visual speech recognition system based on a recurrent neural network transducer (RNN-T) architecture. To support the…”
    Get full text
    Conference Proceeding
  10. 10

    GMM-free DNN acoustic model training by Senior, Andrew, Heigold, Georg, Bacchiani, Michiel, Liao, Hank

    “…While deep neural networks (DNNs) have become the dominant acoustic model (AM) for speech recognition systems, they are still dependent on Gaussian mixture…”
    Get full text
    Conference Proceeding
  11. 11

    Exemplar-based large vocabulary speech recognition using k-nearest neighbors by Yanbo Xu, Siohan, Olivier, Simcha, David, Kumar, Sanjiv, Liao, Hank

    “…This paper describes a large scale exemplar-based acoustic modeling approach for large vocabulary continuous speech recognition. We construct an index of…”
    Get full text
    Conference Proceeding
  12. 12

    On Robustness to Missing Video for Audiovisual Speech Recognition by Chang, Oscar, Braga, Otavio, Liao, Hank, Serdyuk, Dmitriy, Siohan, Olivier

    Published 13-12-2023
    “…It has been shown that learning audiovisual features can lead to improved speech recognition performance over audio-only features, especially for noisy speech…”
    Get full text
    Journal Article
  13. 13

    Conformers are All You Need for Visual Speech Recognition by Chang, Oscar, Liao, Hank, Serdyuk, Dmitriy, Shah, Ankit, Siohan, Olivier

    Published 16-02-2023
    “…Visual speech recognition models extract visual features in a hierarchical manner. At the lower level, there is a visual front-end with a limited temporal…”
    Get full text
    Journal Article
  14. 14

    Lattice rescoring strategies for long short term memory language models in speech recognition by Kumar, Shankar, Nirschl, Michael, Holtmann-Rice, Daniel, Liao, Hank, Suresh, Ananda Theertha, Yu, Felix

    “…Recurrent neural network (RNN) language models (LMs) and Long Short Term Memory (LSTM) LMs, a variant of RNN LMs, have been shown to outperform traditional…”
    Get full text
    Conference Proceeding
  15. 15

    End-to-End Multi-Person Audio/Visual Automatic Speech Recognition by Braga, Otavio, Makino, Takaki, Siohan, Olivier, Liao, Hank

    Published 11-05-2022
    “…Traditionally, audio-visual automatic speech recognition has been studied under the assumption that the speaking face on the visual signal is the face matching…”
    Get full text
    Journal Article
  16. 16

    DiarizationLM: Speaker Diarization Post-Processing with Large Language Models by Wang, Quan, Huang, Yiling, Zhao, Guanlong, Clark, Evan, Xia, Wei, Liao, Hank

    Published 07-01-2024
    “…In this paper, we introduce DiarizationLM, a framework to leverage large language models (LLM) to post-process the outputs from a speaker diarization system…”
    Get full text
    Journal Article
  17. 17

    Towards Word-Level End-to-End Neural Speaker Diarization with Auxiliary Network by Huang, Yiling, Wang, Weiran, Zhao, Guanlong, Liao, Hank, Xia, Wei, Wang, Quan

    Published 15-09-2023
    “…While standard speaker diarization attempts to answer the question "who spoken when", most of relevant applications in reality are more interested in…”
    Get full text
    Journal Article
  18. 18

    USM-SCD: Multilingual Speaker Change Detection Based on Large Pretrained Foundation Models by Zhao, Guanlong, Wang, Yongqiang, Pelecanos, Jason, Zhang, Yu, Liao, Hank, Huang, Yiling, Lu, Han, Wang, Quan

    Published 14-09-2023
    “…We introduce a multilingual speaker change detection model (USM-SCD) that can simultaneously detect speaker turns and perform ASR for 96 languages. This model…”
    Get full text
    Journal Article
  19. 19

    Adversarial Training for Multilingual Acoustic Modeling by Hu, Ke, Sak, Hasim, Liao, Hank

    Published 17-06-2019
    “…Multilingual training has been shown to improve acoustic modeling performance by sharing and transferring knowledge in modeling different languages. Knowledge…”
    Get full text
    Journal Article
  20. 20

    Neural Language Modeling with Visual Features by Anastasopoulos, Antonios, Kumar, Shankar, Liao, Hank

    Published 07-03-2019
    “…Multimodal language models attempt to incorporate non-linguistic features for the language modeling task. In this work, we extend a standard recurrent neural…”
    Get full text
    Journal Article