VLSI implementation of a 1616 discrete cosine transform

The implementation of a 16*16 discrete cosine transform (DCT) chip using a concurrent architecture is presented. The chip contains 32 processing elements working in parallel and a random-access memory (RAM) which performs a 16*16 matrix transposition. The structure is highly regular and modular, and...

Full description

Saved in:
Bibliographic Details
Published in:IEEE transactions on circuits and systems Vol. 36; no. 4; pp. 610 - 617
Main Authors: Sun, M.-T., Chen, T.-C., Gottlieb, A.M.
Format: Journal Article
Language:English
Published: IEEE 01-04-1989
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The implementation of a 16*16 discrete cosine transform (DCT) chip using a concurrent architecture is presented. The chip contains 32 processing elements working in parallel and a random-access memory (RAM) which performs a 16*16 matrix transposition. The structure is highly regular and modular, and thus very efficient for VLSI implementation. The chip was designed for real-time processing of 14.3-MHz sample video data. It performs an equivalent of a half billion multiplications and accumulations per second. Fabricated in 2- mu m double-metal CMOS technology, the chip contains approximately 73000 transistors which occupy a 7.2*7.0-mm/sup 2/ area. The 68-pad die size is 8.3*8.1 mm/sup 2/. It is fully functional and is the first working 16*16 DCT chip. The architecture and accuracy studies for finite-wordlength processing are presented. The circuit design and layout using the symbolic design tool MULGA are described in detail. Possible variations are also discussed for multipurpose (variable transform sizes, forward-inverse transform) applications.< >
ISSN:0098-4094
1558-1276
DOI:10.1109/31.92893