2.4 ATOMUS: A 5nm 32TFLOPS/128TOPS ML System-on-Chip for Latency Critical Applications

The growing computational demands of AI inference have led to widespread use of hardware accelerators for different platforms, spanning from edge to the datacenter/cloud. Certain AI application areas, such as in high-frequency trading (HFT) [1-2], have a hard inference latency deadline for successfu...

Full description

Saved in:
Bibliographic Details
Published in:2024 IEEE International Solid-State Circuits Conference (ISSCC) Vol. 67; pp. 42 - 44
Main Authors: Yu, Chang-Hyo, Kim, Hyo-Eun, Shin, Sungho, Bong, Kyeongryeol, Kim, Hyunsuk, Boo, Yoonho, Bae, Jaewan, Kwon, Minjae, Charfi, Karim, Kim, Jinseok, Kim, Hongyun, Shim, Myeongbo, Ha, Changsoo, Shin, Wongyu, Yoon, Jae-Sung, Chi, Miock, Lee, Byungjae, Choi, Sungpill, Kim, Donghan, Woo, Jeongseok, Yoon, Seokju, Jo, Hyunje, Kim, Hyunho, Heo, Hyungseok, Jin, Young-Jae, Yu, Jiun, Lee, Jaehwan, Kim, Hyunsung, Kang, Minhoo, Choi, Seokhyeon, Kim, Seung-Goo, Choi, Myunghoon, Oh, Jungju, Kim, Yunseong, Kim, Haejoon, Je, Sangeun, Ham, Junhee, Yoon, Juyeong, Lee, Jaedon, Park, Seonhyeok, Park, Youngseob, Lee, Jaebong, Hong, Boeui, Ryu, Jaehun, Ko, Hyunseok, Chung, Kwanghyun, Choi, Jongho, Jung, Sunwook, Arthanto, Yashael Faith, Kim, Jonghyeon, Cho, Heejin, Jeong, Hyebin, Choi, Sungmin, Han, Sujin, Park, Junkyu, Lee, Kwangbae, Bae, Sung-Il, Bang, Jaeho, Lee, Kyeong-Jae, Jang, Yeongsang, Park, Jungchul, Park, Sanggyu, Park, Jueon, Shin, Hyein, Park, Sunghyun, Oh, Jinwook
Format: Conference Proceeding
Language:English
Published: IEEE 18-02-2024
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The growing computational demands of AI inference have led to widespread use of hardware accelerators for different platforms, spanning from edge to the datacenter/cloud. Certain AI application areas, such as in high-frequency trading (HFT) [1-2], have a hard inference latency deadline for successful execution. We present our new AI accelerator which achieves high inference capability with outstanding single-stream responsiveness for demanding service-layer objective (SLO)-based AI services and pipelined inference applications, including large language models (LLM). Owing to low thermal design power (TDP), the scale-out solution can support multi-stream applications, as well as total cost of ownership (TCO)-centric systems effectively.
ISSN:2376-8606
DOI:10.1109/ISSCC49657.2024.10454509