Search Results - "SAON, George"

Refine Results
  1. 1

    Deep Convolutional Neural Networks for Large-scale Speech Tasks by Sainath, Tara N., Kingsbury, Brian, Saon, George, Soltau, Hagen, Mohamed, Abdel-rahman, Dahl, George, Ramabhadran, Bhuvana

    Published in Neural networks (01-04-2015)
    “…Convolutional Neural Networks (CNNs) are an alternative type of neural network that can be used to reduce spectral variations and model spectral correlations…”
    Get full text
    Journal Article
  2. 2

    Advancing RNN Transducer Technology for Speech Recognition by Saon, George, Tuske, Zoltan, Bolanos, Daniel, Kingsbury, Brian

    “…We investigate a set of techniques for RNN Transducers (RNN-Ts) that were instrumental in lowering the word error rate on three different tasks (Switchboard…”
    Get full text
    Conference Proceeding
  3. 3

    Building Competitive Direct Acoustics-to-Word Models for English Conversational Speech Recognition by Audhkhasi, Kartik, Kingsbury, Brian, Ramabhadran, Bhuvana, Saon, George, Picheny, Michael

    “…Direct acoustics-to-word (A2W) models in the end-to-end paradigm have received increasing attention compared to conventional subword based automatic speech…”
    Get full text
    Conference Proceeding
  4. 4

    Analyzing convolutional neural networks for speech activity detection in mismatched acoustic conditions by Thomas, Samuel, Ganapathy, Sriram, Saon, George, Soltau, Hagen

    “…Convolutional neural networks (CNN) are extensions to deep neural networks (DNN) which are used as alternate acoustic models with state-of-the-art performances…”
    Get full text
    Conference Proceeding
  5. 5

    Joint training of convolutional and non-convolutional neural networks by Soltau, Hagen, Saon, George, Sainath, Tara N.

    “…We describe a simple modification of neural networks which consists in extending the commonly used linear layer structure to an arbitrary graph structure. This…”
    Get full text
    Conference Proceeding
  6. 6

    Speaker adaptation of neural network acoustic models using i-vectors by Saon, George, Soltau, Hagen, Nahamoo, David, Picheny, Michael

    “…We propose to adapt deep neural network (DNN) acoustic models to a target speaker by supplying speaker identity vectors (i-vectors) as input features to the…”
    Get full text
    Conference Proceeding
  7. 7

    Alignment-Length Synchronous Decoding for RNN Transducer by Saon, George, Tuske, Zoltan, Audhkhasi, Kartik

    “…We present a beam decoding strategy for recurrent neural network transducers which has the characteristic that all competing hypotheses within the beam have…”
    Get full text
    Conference Proceeding
  8. 8

    Diagonal State Space Augmented Transformers for Speech Recognition by Saon, George, Gupta, Ankit, Cui, Xiaodong

    “…We improve on the popular conformer architecture by replacing the depthwise temporal convolutions with diagonal state space (DSS) models. DSS is a recently…”
    Get full text
    Conference Proceeding
  9. 9

    Boosting systems for large vocabulary continuous speech recognition by Saon, George, Soltau, Hagen

    Published in Speech communication (01-02-2012)
    “…► We apply the Adaboost algorithm to large vocabulary continuous speech recognition. ► Acoustic models are trained sequentially on re-weighted data. ► Phonetic…”
    Get full text
    Journal Article
  10. 10

    Multiple Representation Transfer from Large Language Models to End-to-End ASR Systems by Udagawa, Takuma, Suzuki, Masayuki, Kurata, Gakuto, Muraoka, Masayasu, Saon, George

    “…Transferring the knowledge of large language models (LLMs) is a promising technique to incorporate linguistic knowledge into end-to-end automatic speech…”
    Get full text
    Conference Proceeding
  11. 11

    Semi-Autoregressive Streaming ASR with Label Context by Arora, Siddhant, Saon, George, Watanabe, Shinji, Kingsbury, Brian

    “…Non-autoregressive (NAR) modeling has gained significant interest in speech processing since these models achieve dramatically lower inference time than…”
    Get full text
    Conference Proceeding
  12. 12

    Sequence Noise Injected Training for End-to-end Speech Recognition by Saon, George, Tuske, Zoltan, Audhkhasi, Kartik, Kingsbury, Brian

    “…We present a simple noise injection algorithm for training end-to-end ASR models which consists in adding to the spectra of training utterances the scaled…”
    Get full text
    Conference Proceeding
  13. 13

    Towards Reducing the Need for Speech Training Data to Build Spoken Language Understanding Systems by Thomas, Samuel, Kuo, Hong-Kwang J., Kingsbury, Brian, Saon, George

    “…The lack of speech data annotated with labels required for spoken language understanding (SLU) is often a major hurdle in building end-to-end (E2E) systems…”
    Get full text
    Conference Proceeding
  14. 14

    Distributed Deep Learning Strategies for Automatic Speech Recognition by Zhang, Wei, Cui, Xiaodong, Finkler, Ulrich, Kingsbury, Brian, Saon, George, Kung, David, Picheny, Michael

    “…In this paper, we propose and investigate a variety of distributed deep learning strategies for automatic speech recognition (ASR) and evaluate them with a…”
    Get full text
    Conference Proceeding
  15. 15

    Integrating Text Inputs for Training and Adapting RNN Transducer ASR Models by Thomas, Samuel, Kingsbury, Brian, Saon, George, Kuo, Hong-Kwang J.

    “…Compared to hybrid automatic speech recognition (ASR) systems that use a modular architecture in which each component can be in-dependently adapted to a new…”
    Get full text
    Conference Proceeding
  16. 16

    A nonmonotone learning rate strategy for SGD training of deep neural networks by Keskar, Nitish Shirish, Saon, George

    “…The algorithm of choice for cross-entropy training of deep neural network (DNN) acoustic models is mini-batch stochastic gradient descent (SGD). One of the…”
    Get full text
    Conference Proceeding
  17. 17

    A comparison of two optimization techniques for sequence discriminative training of deep neural networks by Saon, George, Soltau, Hagen

    “…We compare two optimization methods for lattice-based sequence discriminative training of neural network acoustic models: distributed Hessian-free (DHF) and…”
    Get full text
    Conference Proceeding
  18. 18
  19. 19

    Speech Recognition Using Biologically-Inspired Neural Networks by Bohnstingl, Thomas, Garg, Ayush, Wozniak, Stanislaw, Saon, George, Eleftheriou, Evangelos, Pantazi, Angeliki

    “…Automatic speech recognition systems (ASR), such as the recurrent neural network transducer (RNN-T), have reached close to human-like performance and are…”
    Get full text
    Conference Proceeding
  20. 20

    Multi-Speaker Data Augmentation for Improved end-to-end Automatic Speech Recognition by Thomas, Samuel, Kuo, Hong-Kwang J., Saon, George, Kingsbury, Brian

    “…Publicly available datasets traditionally used to train E2E ASR models for conversational telephone speech recognition are based on clean, short duration,…”
    Get full text
    Conference Proceeding