Relax: Composable Abstractions for End-to-End Dynamic Machine Learning

Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for o...

Full description

Saved in:
Bibliographic Details
Main Authors: Lai, Ruihang, Shao, Junru, Feng, Siyuan, Lyubomirsky, Steven S, Hou, Bohan, Lin, Wuwei, Ye, Zihao, Jin, Hongyi, Jin, Yuchen, Liu, Jiawei, Jin, Lesheng, Cai, Yaxing, Jiang, Ziheng, Wu, Yong, Park, Sunghyun, Srivastava, Prakalp, Roesch, Jared G, Mowry, Todd C, Chen, Tianqi
Format: Journal Article
Language:English
Published: 01-11-2023
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Dynamic shape computations have become critical in modern machine learning workloads, especially in emerging large language models. The success of these models has driven demand for deploying them to a diverse set of backend environments. In this paper, we present Relax, a compiler abstraction for optimizing end-to-end dynamic machine learning workloads. Relax introduces first-class symbolic shape annotations to track dynamic shape computations globally across the program. It also introduces a cross-level abstraction that encapsulates computational graphs, loop-level tensor programs, and library calls in a single representation to enable cross-level optimizations. We build an end-to-end compilation framework using the proposed approach to optimize dynamic shape models. Experimental results on large language models show that Relax delivers performance competitive with state-of-the-art hand-optimized systems across platforms and enables deployment of emerging dynamic models to a broader set of environments, including mobile phones, embedded devices, and web browsers.
DOI:10.48550/arxiv.2311.02103