Search Results - "Tallent, Nathan R."
-
1
Evaluating Modern GPU Interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect
Published in IEEE transactions on parallel and distributed systems (01-01-2020)“…High performance multi-GPU computing becomes an inevitable trend due to the ever-increasing demand on computation capability in emerging domains such as deep…”
Get full text
Journal Article -
2
Characterizing Performance of Graph Neighborhood Communication Patterns
Published in IEEE transactions on parallel and distributed systems (01-04-2022)“…Distributed-memory graph algorithms are fundamental enablers in scientific computing and analytics workflows. A majority of graph algorithms rely on the graph…”
Get full text
Journal Article -
3
MemGaze: Rapid and Effective Load-Level Memory Trace Analysis
Published in 2022 IEEE International Conference on Cluster Computing (CLUSTER) (01-09-2022)“…A challenge of memory trace analysis is combining detailed analysis and low overhead measurement. Currently, hardware/software-based analysis of load-level…”
Get full text
Conference Proceeding -
4
EXAGRAPH: Graph and combinatorial methods for enabling exascale applications
Published in The international journal of high performance computing applications (01-11-2021)“…Combinatorial algorithms in general and graph algorithms in particular play a critical enabling role in numerous scientific applications. However, the…”
Get full text
Journal Article -
5
Rapidly Measuring Loop Footprints
Published in 2019 IEEE International Conference on Cluster Computing (CLUSTER) (01-09-2019)“…Knowing a loop's footprint - the unique data items it accesses - enables important locality and capacity analysis. Unfortunately, current methods for computing…”
Get full text
Conference Proceeding -
6
Identifying Performance Bottlenecks in Work-Stealing Computations
Published in Computer (Long Beach, Calif.) (01-12-2009)“…Work stealing is an effective load-balancing strategy for multithreading, but when computations based on it underperform, traditional tools can't explain why…”
Get full text
Journal Article -
7
ReWorDs 2022 Keynote: Towards Orchestrating Distributed & Data-Intensive Workflows
Published in 2022 IEEE 18th International Conference on e-Science (e-Science) (01-10-2022)“…Scientific exploration and hypothesis generation is increasingly dependent on the convergence of scientific modeling, data analytics, and machine learning. The…”
Get full text
Conference Proceeding -
8
Vertex Reordering for Real-World Graphs and Applications: An Empirical Evaluation
Published in 2020 IEEE International Symposium on Workload Characterization (IISWC) (01-10-2020)“…Vertex reordering is a way to improve locality in graph computations. Given an input (or "natural") order, reordering aims to compute an alternate permutation…”
Get full text
Conference Proceeding -
9
Graph Analytics on Jellyfish topology
Published in 2024 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (27-05-2024)“…Because large unstructured datasets are important for many science domains, distributed graph analytics is critical to many scientists. Unfortunately,…”
Get full text
Conference Proceeding -
10
Effectively Presenting Call Path Profiles of Application Performance
Published in 2010 39th International Conference on Parallel Processing Workshops (01-09-2010)“…Call path profiling is a scalable measurement technique that has been shown to provide insight into the performance characteristics of complex modular…”
Get full text
Conference Proceeding -
11
QuaL2 M: Learning Quantitative Performance of Latency-Sensitive Code
Published in 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW) (01-05-2022)“…Quantitative performance predictions are more informative than qualitative. However, modeling of latency-sensitive code, with cost distributions of high…”
Get full text
Conference Proceeding -
12
Rapid Memory Footprint Access Diagnostics
Published in 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (01-08-2020)“…Footprint and reuse distance measure temporal locality and therefore do not capture the significance of access patterns (spacial locality). A strided access…”
Get full text
Conference Proceeding -
13
Effectively Using Remote I/O For Work Composition in Distributed Workflows
Published in 2020 IEEE International Conference on Big Data (Big Data) (10-12-2020)“…Distributed scientific workflows are becoming more important with the interest in incorporating AI into their loops. A critical programming and performance…”
Get full text
Conference Proceeding -
14
TAZeR: Hiding the Cost of Remote I/O in Distributed Scientific Workflows
Published in 2019 IEEE International Conference on Big Data (Big Data) (01-12-2019)“…Many scientific workflows access data derived from specialized instruments. When the data is analyzed, it is accessed over wide area networks, creating…”
Get full text
Conference Proceeding -
15
Geomancy: Automated Performance Enhancement through Data Layout Optimization
Published in 2020 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS) (01-08-2020)“…The size and complexity of large storage systems, such as high-performance computing (HPC) systems, inhibit rapid effective restructuring of data layouts to…”
Get full text
Conference Proceeding -
16
SAM-I-Am: Semantic boosting for zero-shot atomic-scale electron micrograph segmentation
Published in Computational materials science (01-01-2025)“…Image segmentation is a critical enabler for tasks ranging from medical diagnostics to autonomous driving. However, the correct segmentation semantics — where…”
Get full text
Journal Article -
17
Fault Modeling of Extreme Scale Applications Using Machine Learning
Published in 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS) (01-05-2016)“…Faults are commonplace in large scale systems. These systems experience a variety of faults such as transient, permanent and intermittent. Multi-bit faults are…”
Get full text
Conference Proceeding Journal Article -
18
Scaling Deep Learning workloads: NVIDIA DGX-1/Pascal and Intel Knights Landing
Published in Future generation computer systems (01-07-2020)“…Deep Learning (DL) algorithms have become ubiquitous in data analytics. As a result, major computing vendors – including NVIDIA, Intel, AMD, and IBM – have…”
Get full text
Journal Article -
19
MassiveGNN: Efficient Training via Prefetching for Massively Connected Distributed Graphs
Published in Proceedings / IEEE International Conference on Cluster Computing (24-09-2024)“…Graph Neural Networks (GNN) are indispensable in learning from graph-structured data, yet their rising computational costs, especially on massively connected…”
Get full text
Conference Proceeding -
20
Accelerating matrix-centric graph processing on GPUs through bit-level optimizations
Published in Journal of parallel and distributed computing (04-03-2023)“…Even though it is well known that binary values are common in graph applications (e.g., adjacency matrix), how to leverage the phenomenon for efficiency has…”
Get full text
Journal Article