Accurately modeling superscalar processor performance with reduced trace

Trace-driven simulation of out-of-order superscalar processors is far from straightforward. The dynamic nature of out-of-order superscalar processors combined with the static nature of traces can lead to large inaccuracies in the results when the traces contain only a subset of executed instructions...

Full description

Saved in:
Bibliographic Details
Published in:Journal of parallel and distributed computing Vol. 73; no. 4; pp. 509 - 521
Main Authors: Lee, Kiyeon, Cho, Sangyeun
Format: Journal Article
Language:English
Published: Amsterdam Elsevier Inc 01-04-2013
Elsevier
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Trace-driven simulation of out-of-order superscalar processors is far from straightforward. The dynamic nature of out-of-order superscalar processors combined with the static nature of traces can lead to large inaccuracies in the results when the traces contain only a subset of executed instructions for trace reduction. In this paper, we describe and comprehensively evaluate the pairwise dependent cache miss model (PDCM), a framework for fast and accurate trace-driven simulation of out-of-order superscalar processors. The model determines how to treat a cache miss with respect to other cache misses recorded in the trace by dynamically reconstructing the reorder buffer state during simulation and honoring the dependencies between the trace items. Our experimental results demonstrate that a PDCM-based simulator produces highly accurate simulation results (less than 3% error) with fast simulation speeds (62.5× on average) compared with an execution-driven simulator. Moreover, we observed that the proposed simulation method is capable of preserving a processor’s dynamic off-core memory access behavior and accurately predicting the relative performance change when a processor’s low-level memory hierarchy parameters are changed. ► Superscalar processor performance can be accurately modeled with filtered trace. ► Our trace simulation model is 62.5× faster than an execution-driven simulator. ► Filtered trace can be used to model important processor artifacts. ► Our trace simulation model can be used to study system-wide resources.
ISSN:0743-7315
1096-0848
DOI:10.1016/j.jpdc.2012.12.002