Area and time efficient implementations of matrix multiplication on FPGAs

We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in terms of the area/speed metric where the speed denotes the maximum achievable running freque...

Full description

Saved in:

Bibliographic Details
Published in:	2002 IEEE International Conference on Field-Programmable Technology, 2002. (FPT). Proceedings pp. 93 - 100
Main Authors:	Ju-wook Jang, Seonil Choi, Prasanna, V.K.K.
Format:	Conference Proceeding
Language:	English
Published:	New York, NY IEEE 2002
Subjects:	Applied sciences Computer systems Delay Electronics Exact sciences and technology Field programmable gate arrays Frequency Graphics Hardware Image processing Kernel Robots Signal processing Signal processing algorithms Performance evaluation Field programmable gate array Computer hardware Algorithm Implementation
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	We develop new algorithms and architectures for matrix multiplication on configurable hardware. These designs significantly reduce the latency as well as the area. Our designs improve the previous designs in terms of the area/speed metric where the speed denotes the maximum achievable running frequency. The area/speed metrics for the previous designs and our design are 14.45, 4.93, and 2.35, respectively, for 4 /spl times/ 4 matrix multiplication. The latency of one of the previous design is 0.57 /spl mu/s, while our design takes 0.15 /spl mu/s using 18% less area. The area of our designs is smaller by 11% - 46% compared with the best known systolic designs with the same latency for the matrices of sizes 3 /spl times/ 3 - 12 /spl times/ 12. The performance improvements tend to grow with the problem size.
ISBN:	0780375742 9780780375741
DOI:	10.1109/FPT.2002.1188669