Search Results - "Jablin, Thomas B."
-
1
Ten Lessons From Three Generations Shaped Google's TPUv4i : Industrial Product
Published in 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA) (01-06-2021)“…Google deployed several TPU generations since 2015, teaching us lessons that changed our views: semi-conductor technology advances unequally; compiler…”
Get full text
Conference Proceeding -
2
MLPerf Inference Benchmark
Published in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) (01-05-2020)“…Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded…”
Get full text
Conference Proceeding -
3
Automatic execution of single-GPU computations across multiple GPUs
Published in 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT) (24-08-2014)“…We present AMGE, a programming framework and runtime system to decompose data and GPU kernels and execute them on multiple GPUs concurrently. AMGE exploits the…”
Get full text
Conference Proceeding -
4
Warp-aware trace scheduling for GPUs
Published in 2014 23rd International Conference on Parallel Architecture and Compilation Techniques (PACT) (01-08-2014)“…GPU performance depends not only on thread/warp level parallelism (TLP) but also on instruction-level parallelism (ILP). It is not enough to schedule…”
Get full text
Conference Proceeding -
5
Speculatively exploiting cross-invocation parallelism
Published in 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT) (01-09-2016)“…Automatic parallelization has shown promise in producing scalable multi-threaded programs for multi-core architectures. Most existing automatic techniques…”
Get full text
Conference Proceeding -
6
Automatic Parallelization for GPUs
Published 01-01-2013“…GPUs are flexible parallel processors capable of accelerating real applications. To exploit them, programmers rewrite programs in new languages using intimate…”
Get full text
Dissertation -
7
Chai: Collaborative heterogeneous applications for integrated-architectures
Published in 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (01-04-2017)“…Heterogeneous system architectures are evolving towards tighter integration among devices, with emerging features such as shared virtual memory, memory…”
Get full text
Conference Proceeding -
8
A collaborative dependence analysis framework
Published in 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (01-02-2017)“…Compiler optimizations discover facts about program behavior by querying static analysis. However, developing or extending precise analysis is difficult. Some…”
Get full text
Conference Proceeding -
9
Automatic Parallelization for GPUs
Get full text
Dissertation -
10
Automatically exploiting cross-invocation parallelism using runtime information
Published in Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO) (23-02-2013)“…Automatic parallelization is a promising approach to producing scalable multi-threaded programs for multicore architectures. Many existing automatic techniques…”
Get full text
Conference Proceeding -
11
MLPerf Inference Benchmark
Published 06-11-2019“…Machine-learning (ML) hardware and software system demand is burgeoning. Driven by ML applications, the number of different ML inference systems has exploded…”
Get full text
Journal Article -
12
A survey of the practice of computational science
Published in 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC) (01-11-2011)“…Computing plays an indispensable role in scientific research. Presently, researchers in science have different problems, needs, and beliefs about computation…”
Get full text
Conference Proceeding -
13
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Published 21-02-2019“…Lingvo is a Tensorflow framework offering a complete solution for collaborative deep learning research, with a particular focus towards sequence-to-sequence…”
Get full text
Journal Article