Vectorizing divergent control flow with active-lane consolidation on long-vector architectures
Control-flow divergence limits the applicability of loop vectorization, an important code-transformation that accelerates data-parallel loops. Control-flow divergence is commonly handled using an IF-conversion transformation combined with vector predication. However, the resulting vector instruction...
Saved in:
Published in: | The Journal of supercomputing Vol. 78; no. 10; pp. 12553 - 12588 |
---|---|
Main Authors: | , , , , |
Format: | Journal Article |
Language: | English |
Published: |
New York
Springer US
01-07-2022
Springer Nature B.V |
Subjects: | |
Online Access: | Get full text |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Control-flow divergence limits the applicability of loop vectorization, an important code-transformation that accelerates data-parallel loops. Control-flow divergence is commonly handled using an IF-conversion transformation combined with vector predication. However, the resulting vector instructions execute inefficiently with many inactive lanes. Branch-on-superword-condition-code (BOSCC) instructions are used to skip over some vector instructions, but their effectiveness decreases as vector length increases. This paper presents a novel vector permutation, Active-lane consolidation (
ALC
), that enables efficient execution of control-divergent loops by consolidating the active lanes of two vectors. This paper demonstrates the use of
ALC
with two loop transformations and applies them to kernels extracted from the SPEC CPU 2017 benchmark suite leading to up to a 30.9% reduction in dynamic instruction count compared to optimization using only BOSCCs. Motivated by
ALC
, this paper also proposes design changes to the ARM scalable vector extension (SVE) to improve vectorization of control-divergent loops. |
---|---|
ISSN: | 0920-8542 1573-0484 |
DOI: | 10.1007/s11227-022-04359-w |