Search Results - "Cowan, Meghan"
-
1
Automated Generation of Domain Specific Kernels
Published 01-01-2021“…Seamless gains in performance from technology scaling is coming to an end, but many applications rely on hardware and their compilation stacks to continue…”
Get full text
Dissertation -
2
Towards a Standardized Representation for Deep Learning Collective Algorithms
Published in 2024 IEEE Symposium on High-Performance Interconnects (HOTI) (21-08-2024)“…The explosion of machine learning model size has led to its execution on distributed clusters at a very large scale. Many works have tried to optimize the…”
Get full text
Conference Proceeding -
3
Towards a Standardized Representation for Deep Learning Collective Algorithms
Published 20-08-2024“…The explosion of machine learning model size has led to its execution on distributed clusters at a very large scale. Many works have tried to optimize the…”
Get full text
Journal Article -
4
GC3: An Optimizing Compiler for GPU Collective Communication
Published 27-01-2022“…Machine learning models made up of millions or billions of parameters are trained and served on large multi-GPU systems. As models grow in size and execute on…”
Get full text
Journal Article -
5
SoK: Opportunities for Software-Hardware-Security Codesign for Next Generation Secure Computing
Published 02-05-2021“…Users are demanding increased data security. As a result, security is rapidly becoming a first-order design constraint in next generation computing systems…”
Get full text
Journal Article -
6
Porcupine: A Synthesizing Compiler for Vectorized Homomorphic Encryption
Published 19-01-2021“…Homomorphic encryption (HE) is a privacy-preserving technique that enables computation directly on encrypted data. Despite its promise, HE has seen limited use…”
Get full text
Journal Article -
7
TACCL: Guiding Collective Algorithm Synthesis using Communication Sketches
Published 08-11-2021“…Machine learning models are increasingly being trained across multiple GPUs and servers. In this setting, data is transferred between GPUs using communication…”
Get full text
Journal Article -
8
Automating Generation of Low Precision Deep Learning Operators
Published 25-10-2018“…State of the art deep learning models have made steady progress in the fields of computer vision and natural language processing, at the expense of growing…”
Get full text
Journal Article -
9
Exploring computation-communication tradeoffs in camera systems
Published in 2017 IEEE International Symposium on Workload Characterization (IISWC) (01-10-2017)“…Cameras are the defacto sensor. The growing demand for real-time and low-power computer vision, coupled with trends towards high-efficiency heterogeneous…”
Get full text
Conference Proceeding -
10
Analysis and Mitigations of Reverse Engineering Attacks on Local Feature Descriptors
Published 08-05-2021“…As autonomous driving and augmented reality evolve, a practical concern is data privacy. In particular, these applications rely on localization based on user…”
Get full text
Journal Article -
11
Exploring Computation-Communication Tradeoffs in Camera Systems
Published 12-06-2017“…2017 IEEE International Symposium on Workload Characterization (IISWC) Cameras are the defacto sensor. The growing demand for real-time and low-power computer…”
Get full text
Journal Article -
12
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Published 12-02-2018“…There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries…”
Get full text
Journal Article