High-Performance Parallel Radix Sort on FPGA

Sorting is a key part in database operators (like duplicate elimination, sort-merge joins and group-by aggregations). Sorting billions of records in a fast and energy efficient manner has become a key research challenge. In this work, we explore sorting in-memory using a parallel version of Radix So...

Full description

Saved in:
Bibliographic Details
Published in:2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) p. 224
Main Authors: Romanous, Bashar, Rezvani, Mohammadreza, Huang, Junjie, Wong, Daniel, Papalexakis, Evangelos E., Tsotras, Vassilis J., Najjar, Walid
Format: Conference Proceeding
Language:English
Published: IEEE 01-05-2020
Subjects:
Online Access:Get full text
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Sorting is a key part in database operators (like duplicate elimination, sort-merge joins and group-by aggregations). Sorting billions of records in a fast and energy efficient manner has become a key research challenge. In this work, we explore sorting in-memory using a parallel version of Radix Sort to build a high-performance hardware accelerator, called HARS (Hardware Accelerated Radix Sort). Our design enables dividing the unsorted dataset among parallel engines without the need for a merge step. HARS is implemented on Micron's SB-852 FPGA board. The proposed accelerator provides high throughput in-memory sorting at a rate of 44 Million 128-bit records per second. HARS is 1.4x faster than CPU and 1.36x faster than GPU when GPU bandwidth is normalized. Projected performance of a proposed board with a more capable FPGA chip would yield 1.25x higher throughput.
ISSN:2576-2621
DOI:10.1109/FCCM48280.2020.00055