The impact of incorrectly speculated memory operations in a multithreaded architecture

The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load i...

Full description

Saved in:

Bibliographic Details
Published in:	IEEE transactions on parallel and distributed systems Vol. 16; no. 3; pp. 271 - 285
Main Authors:	Sendag, R., Ying Chen, Lilja, D.J.
Format:	Journal Article
Language:	English
Published:	New York IEEE 01-03-2005 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects:	Benchmark testing Communication networks Computer memory Delay Load mispredicted loads multithreaded architecture Pipelines Pollution Prefetching Registers Speculation Studies System performance Traffic control wrong execution wrong execution cache
Online Access:	Get full text
Tags:	Add Tag No Tags, Be the first to tag this record!

Description
Summary:	The speculated execution of threads in a multithreaded architecture, plus the branch prediction used in each thread execution unit, allows many instructions to be executed speculatively, that is, before it is known whether they actually needed by the program. In this study, we examine how the load instructions executed on what turn out to be incorrectly executed program paths impact the memory system performance. We find that incorrect speculation (wrong execution) on the instruction and thread-level provides an indirect prefetching effect for the later correct execution paths and threads. By continuing to execute the mispredicted load instructions even after the instruction or thread-level control speculation is known to be incorrect, the cache misses observed on the correctly executed paths can be reduced by 16 to 73 percent, with an average reduction of 45 percent. However, we also find that these extra loads can increase the amount of memory traffic and can pollute the cache. We introduce the small, fully associative wrong execution cache (WEC) to eliminate the potential pollution that can be caused by the execution of the mispredicted load instructions. Our simulation results show that the WEC can improve the performance of a concurrent multithreaded architecture up to 18.5 percent on the benchmark programs tested, with an average improvement of 9.7 percent, due to the reductions in the number of cache misses.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1045-9219 1558-2183
DOI:	10.1109/TPDS.2005.36