Combining Data Reuse With Data-Level Parallelization for FPGA-Targeted Hardware Compilation: A Geometric Programming Framework

A nonlinear optimization framework is proposed in this paper to automate exploration of the design space consisting of data-reuse (buffering) decisions and loop-level parallelization, in the context of field-programmable-gate-array-targeted hardware compilation. Buffering frequently accessed data in...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on computer-aided design of integrated circuits and systems Vol. 28; no. 3; pp. 305 - 315
Main Authors:	Qiang Liu, Constantinides, G.A., Masselos, K., Cheung, P.
Format:	Journal Article
Language:	English
Published:	New York IEEE 01-03-2009 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Automatic programming Computation Computer architecture data reuse Data-level parallelization Design engineering Design optimization Field programmable gate arrays field-programmable gate-array (FPGA) hardware compilation geometric programming Hardware Memory management Nonlinearity Optimization Optimization methods Parallel processing Parallel programming Programming Random access memory Reuse Studies System-on-a-chip
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	A nonlinear optimization framework is proposed in this paper to automate exploration of the design space consisting of data-reuse (buffering) decisions and loop-level parallelization, in the context of field-programmable-gate-array-targeted hardware compilation. Buffering frequently accessed data in on-chip memories can reduce off-chip memory accesses and open avenues for parallelization. However, the exploitation of both data reuse and parallelization is limited by the memory resources available on-chip. As a result, considering these two problems separately, e.g., first exploring data reuse and then exploring data-level parallelization, based on the data-reuse options determined in the first step, may not yield the performance-optimal designs for limited on-chip memory resources. We consider both problems at the same time, exposing the dependence between the two. We show that this combined problem can be formulated as a nonlinear program and further show that efficient solution techniques exist for this problem, based on recent advances in optimization of so-called geometric programming problems. The results from applying this framework to several real benchmarks implemented on a Xilinx device demonstrate that given different constraints on on-chip memory utilization, the corresponding performance-optimal designs are automatically determined by the framework. We have also implemented designs determined by a two-stage optimization method that first explores data reuse and then explores parallelization on the same platform, and by comparison, the performance-optimal designs proposed by our framework are faster than the designs determined by the two-stage method by up to 5.7 times.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0278-0070 1937-4151
DOI:	10.1109/TCAD.2009.2013541