Search Results - "Keskar, Nitish Shirish"
-
1
Deep learning-enabled breast cancer hormonal receptor status determination from base-level H&E stains
Published in Nature communications (16-11-2020)“…For newly diagnosed breast cancer, estrogen receptor status (ERS) is a key molecular marker used for prognosis and treatment decisions. During clinical…”
Get full text
Journal Article -
2
Balancing Communication and Computation in Distributed Optimization
Published in IEEE transactions on automatic control (01-08-2019)“…Methods for distributed optimization have received significant attention in recent years owing to their wide applicability in various domains including machine…”
Get full text
Journal Article -
3
Limits of Detecting Text Generated by Large-Scale Language Models
Published in 2020 Information Theory and Applications Workshop (ITA) (02-02-2020)“…Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns…”
Get full text
Conference Proceeding -
4
A nonmonotone learning rate strategy for SGD training of deep neural networks
Published in 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (01-04-2015)“…The algorithm of choice for cross-entropy training of deep neural network (DNN) acoustic models is mini-batch stochastic gradient descent (SGD). One of the…”
Get full text
Conference Proceeding -
5
Deep learning-enabled breast cancer hormonal receptor status determination from base-level H E stains
Published in Nature communications (01-11-2020)“…Determination of estrogen receptor status (ERS) in breast cancer tissue requires immunohistochemistry, which is sensitive to the vagaries of sample processing…”
Get full text
Journal Article -
6
Second-Order Methods for Stochastic and Nonsmooth Optimization
Published 2017“…The goal of this thesis is to design practical algorithms for nonlinear optimization in the case when the objective function is stochastic or nonsmooth. The…”
Get full text
Dissertation -
7
Improving Generalization Performance by Switching from Adam to SGD
Published 20-12-2017“…Despite superior training outcomes, adaptive optimization methods such as Adam, Adagrad or RMSprop have been found to generalize poorly compared to Stochastic…”
Get full text
Journal Article -
8
Generating Negative Samples for Sequential Recommendation
Published 07-08-2022“…To make Sequential Recommendation (SR) successful, recent works focus on designing effective sequential encoders, fusing side information, and mining extra…”
Get full text
Journal Article -
9
Modeling Multi-hop Question Answering as Single Sequence Prediction
Published 18-05-2022“…Fusion-in-decoder (Fid) (Izacard and Grave, 2020) is a generative question answering (QA) model that leverages passage retrieval with a pre-trained transformer…”
Get full text
Journal Article -
10
A Limited-Memory Quasi-Newton Algorithm for Bound-Constrained Nonsmooth Optimization
Published 21-12-2016“…We consider the problem of minimizing a continuous function that may be nonsmooth and nonconvex, subject to bound constraints. We propose an algorithm that…”
Get full text
Journal Article -
11
Limits of Detecting Text Generated by Large-Scale Language Models
Published 09-02-2020“…Some consider large-scale language models that can generate long and coherent pieces of text as dangerous, since they may be used in misinformation campaigns…”
Get full text
Journal Article -
12
An Analysis of Neural Language Modeling at Multiple Scales
Published 22-03-2018“…Many of the leading approaches in language modeling introduce novel, complex and specialized architectures. We take existing state-of-the-art word level…”
Get full text
Journal Article -
13
Pretrained AI Models: Performativity, Mobility, and Change
Published 07-09-2019“…The paradigm of pretrained deep learning models has recently emerged in artificial intelligence practice, allowing deployment in numerous societal settings…”
Get full text
Journal Article -
14
Weighted Transformer Network for Machine Translation
Published 06-11-2017“…State-of-the-art results on neural machine translation often use attentional sequence-to-sequence models with some form of convolution or recursion. Vaswani et…”
Get full text
Journal Article -
15
Unifying Question Answering, Text Classification, and Regression via Span Extraction
Published 19-04-2019“…Even as pre-trained language encoders such as BERT are shared across many tasks, the output layers of question answering, text classification, and regression…”
Get full text
Journal Article -
16
Regularizing and Optimizing LSTM Language Models
Published 07-08-2017“…Recurrent neural networks (RNNs), such as long short-term memory networks (LSTMs), serve as a fundamental building block for many sequence learning tasks,…”
Get full text
Journal Article -
17
Unsupervised Paraphrasing with Pretrained Language Models
Published 24-10-2020“…Paraphrase generation has benefited extensively from recent progress in the designing of training objectives and model architectures. However, previous…”
Get full text
Journal Article -
18
Coarse-grain Fine-grain Coattention Network for Multi-evidence Question Answering
Published 02-01-2019“…End-to-end neural models have made significant progress in question answering, however recent studies show that these models implicitly assume that the answer…”
Get full text
Journal Article -
19
Mirostat: A Neural Text Decoding Algorithm that Directly Controls Perplexity
Published 29-07-2020“…Neural text decoding is important for generating high-quality texts using language models. To generate high-quality text, popular decoding algorithms like…”
Get full text
Journal Article -
20
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Published 29-10-2018“…The convergence rate and final performance of common deep learning models have significantly benefited from heuristics such as learning rate schedules,…”
Get full text
Journal Article