Search Results - "Seltzer, Michael L"

Refine Results
  1. 1

    Deep beamforming networks for multi-channel speech recognition by Xiong Xiao, Watanabe, Shinji, Erdogan, Hakan, Liang Lu, Hershey, John, Seltzer, Michael L., Guoguo Chen, Yu Zhang, Mandel, Michael, Dong Yu

    “…Despite the significant progress in speech recognition enabled by deep neural networks, poor performance persists in some scenarios. In this work, we focus on…”
    Get full text
    Conference Proceeding Journal Article
  2. 2

    Deep Neural Networks for Single-Channel Multi-Talker Speech Recognition by Chao Weng, Dong Yu, Seltzer, Michael L., Droppo, Jasha

    “…We investigate techniques based on deep neural networks (DNNs) for attacking the single-channel multi-talker speech recognition problem. Our proposed approach…”
    Get full text
    Journal Article
  3. 3

    Improving speech recognition in reverberation using a room-aware deep neural network and multi-task learning by Giri, Ritwik, Seltzer, Michael L., Droppo, Jasha, Dong Yu

    “…In this paper, we propose two approaches to improve deep neural network (DNN) acoustic models for speech recognition in reverberant environments. Both methods…”
    Get full text
    Conference Proceeding
  4. 4

    Reconstruction of missing features for robust speech recognition by Raj, Bhiksha, Seltzer, Michael L., Stern, Richard M.

    Published in Speech communication (01-09-2004)
    “…Speech recognition systems perform poorly in the presence of corrupting noise. Missing feature methods attempt to compensate for the noise by removing noise…”
    Get full text
    Journal Article
  5. 5

    A study on data augmentation of reverberant speech for robust speech recognition by Ko, Tom, Peddinti, Vijayaditya, Povey, Daniel, Seltzer, Michael L., Khudanpur, Sanjeev

    “…The environmental robustness of DNN-based acoustic models can be significantly improved by using multi-condition training data. However, as data collection is…”
    Get full text
    Conference Proceeding
  6. 6

    A Bayesian classifier for spectrographic mask estimation for missing feature speech recognition by Seltzer, Michael L., Raj, Bhiksha, Stern, Richard M.

    Published in Speech communication (01-09-2004)
    “…Missing feature methods of noise compensation for speech recognition operate by first identifying components of a spectrographic representation of speech that…”
    Get full text
    Journal Article
  7. 7

    An investigation of deep neural networks for noise robust speech recognition by Seltzer, Michael L., Dong Yu, Yongqiang Wang

    “…Recently, a new acoustic model based on deep neural networks (DNN) has been introduced. While the DNN has generated significant improvements over GMM-based…”
    Get full text
    Conference Proceeding
  8. 8

    Transformer-Based Acoustic Modeling for Hybrid Speech Recognition by Wang, Yongqiang, Mohamed, Abdelrahman, Le, Due, Liu, Chunxi, Xiao, Alex, Mahadeokar, Jay, Huang, Hongzhao, Tjandra, Andros, Zhang, Xiaohui, Zhang, Frank, Fuegen, Christian, Zweig, Geoffrey, Seltzer, Michael L.

    “…We propose and evaluate transformer-based acoustic models (AMs) for hybrid speech recognition. Several modeling choices are discussed in this work, including…”
    Get full text
    Conference Proceeding
  9. 9

    Alignment Restricted Streaming Recurrent Neural Network Transducer by Mahadeokar, Jay, Shangguan, Yuan, Le, Duc, Keren, Gil, Su, Hang, Le, Thong, Yeh, Ching-Feng, Fuegen, Christian, Seltzer, Michael L.

    “…There is a growing interest in the speech community in developing Recurrent Neural Network Transducer (RNN-T) models for automatic speech recognition (ASR)…”
    Get full text
    Conference Proceeding
  10. 10

    Toward Human Parity in Conversational Speech Recognition by Xiong, Wayne, Droppo, Jasha, Xuedong Huang, Seide, Frank, Seltzer, Michael L., Stolcke, Andreas, Dong Yu, Zweig, Geoffrey

    “…Conversational speech recognition has served as a flagship speech recognition task since the release of the Switchboard corpus in the 1990s. In this paper, we…”
    Get full text
    Journal Article
  11. 11

    Training Wideband Acoustic Models Using Mixed-Bandwidth Training Data for Speech Recognition by Seltzer, M.L., Acero, A.

    “…One serious difficulty in the deployment of wideband speech recognition systems for new tasks is the expense in both time and cost of obtaining sufficient…”
    Get full text
    Journal Article
  12. 12

    Improved Neural Language Model Fusion for Streaming Recurrent Neural Network Transducer by Kim, Suyoun, Shangguan, Yuan, Mahadeokar, Jay, Bruguier, Antoine, Fuegen, Christian, Seltzer, Michael L., Le, Duc

    “…Recurrent Neural Network Transducer (RNN-T), like most end-to-end speech recognition model architectures, has an implicit neural network language model (NNLM)…”
    Get full text
    Conference Proceeding
  13. 13

    Aipnet: Generative Adversarial Pre-Training of Accent-Invariant Networks for End-To-End Speech Recognition by Chen, Yi-Chen, Yang, Zhaojun, Yeh, Ching-Feng, Jain, Mahaveer, Seltzer, Michael L.

    “…As one of the major sources in speech variability, accents have posed a grand challenge to the robustness of speech recognition systems. In this paper, our…”
    Get full text
    Conference Proceeding
  14. 14

    Factorized Blank Thresholding for Improved Runtime Efficiency of Neural Transducers by Le, Duc, Seide, Frank, Wang, Yuhao, Li, Yang, Schubert, Kjell, Kalinli, Ozlem, Seltzer, Michael L.

    “…We show how factoring the RNN-T's output distribution can significantly reduce the computation cost and power consumption for on-device ASR inference with no…”
    Get full text
    Conference Proceeding
  15. 15

    G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR by Le, Duc, Koehler, Thilo, Fuegen, Christian, Seltzer, Michael L.

    “…Grapheme-based acoustic modeling has recently been shown to outperform phoneme-based approaches in both hybrid and end-to-end automatic speech recognition…”
    Get full text
    Conference Proceeding
  16. 16

    Massively Multilingual ASR on 70 Languages: Tokenization, Architecture, and Generalization Capabilities by Tjandra, Andros, Singhal, Nayan, Zhang, David, Kalinli, Ozlem, Mohamed, Abdelrahman, Le, Duc, Seltzer, Michael L.

    “…End-to-end multilingual ASR has become more appealing because of several reasons such as simplifying the training and deployment process and positive…”
    Get full text
    Conference Proceeding
  17. 17

    Memory-Efficient Speech Recognition on Smart Devices by Venkatesh, Ganesh, Valliappan, Alagappan, Mahadeokar, Jay, Shangguan, Yuan, Fuegen, Christian, Seltzer, Michael L., Chandra, Vikas

    “…Recurrent transducer models have emerged as a promising solution for speech recognition on the current and next generation smart devices. The transducer models…”
    Get full text
    Conference Proceeding
  18. 18

    Improving fast-slow Encoder based Transducer with Streaming Deliberation by Li, Ke, Mahadeokar, Jay, Guo, Jinxi, Shi, Yangyang, Keren, Gil, Kalinli, Ozlem, Seltzer, Michael L., Le, Duc

    “…This paper introduces a fast-slow encoder based transducer with streaming deliberation for end-to-end automatic speech recognition. We aim to improve the…”
    Get full text
    Conference Proceeding
  19. 19

    Deep neural network features and semi-supervised training for low resource speech recognition by Thomas, Samuel, Seltzer, Michael L., Church, Kenneth, Hermansky, Hynek

    “…We propose a new technique for training deep neural networks (DNNs) as data-driven feature front-ends for large vocabulary continuous speech recognition…”
    Get full text
    Conference Proceeding
  20. 20

    Neural-FST Class Language Model for End-to-End Speech Recognition by Bruguier, Antoine, Le, Duc, Prabhavalkar, Rohit, Li, Dangna, Liu, Zhe, Wang, Bo, Chang, Eun, Peng, Fuchun, Kalinli, Ozlem, Seltzer, Michael L.

    “…We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech recognition, a novel method that combines neural network language models (NNLMs) and…”
    Get full text
    Conference Proceeding