Search Results - "Ganger, Gregory R."
-
1
TVARAK: Software-Managed Hardware Offload for Redundancy in Direct-Access NVM Storage
Published in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA) (01-05-2020)“…Production storage systems complement device-level ECC (which covers media errors) with system-checksums and cross-device parity. This system-level redundancy…”
Get full text
Conference Proceeding -
2
Open Cirrus: A Global Cloud Computing Testbed
Published in Computer (Long Beach, Calif.) (01-04-2010)“…Open Cirrus is a cloud computing testbed that, unlike existing alternatives, federates distributed data centers. It aims to spur innovation in systems and…”
Get full text
Journal Article -
3
Visualizing Request-Flow Comparison to Aid Performance Diagnosis in Distributed Systems
Published in IEEE transactions on visualization and computer graphics (01-12-2013)“…Distributed systems are complex to develop and administer, and performance problem diagnosis is particularly challenging. When performance degrades, the…”
Get full text
Journal Article -
4
Compact Filters for Fast Online Data Partitioning
Published in 2019 IEEE International Conference on Cluster Computing (CLUSTER) (01-09-2019)“…We are approaching a point in time when it will be infeasible to catalog and query data after it has been generated. This trend has fueled research on in-situ…”
Get full text
Conference Proceeding -
5
Survivable information storage systems
Published in Computer (Long Beach, Calif.) (01-08-2000)“…As society increasingly relies on digitally stored and accessed information, supporting the availability, integrity and confidentiality of this information is…”
Get full text
Journal Article -
6
On IO Latency Prediction Accuracy and Automated Load Balancing in Consolidated VM Environments
Published in 2016 IEEE International Conference on Cloud Engineering (IC2E) (01-04-2016)“…Manually managing IO workloads and performance in consolidated VM environments is often difficult and error prone. Thus, automated IO workload (re) placement…”
Get full text
Conference Proceeding -
7
Disk arrays: high-performance, high-reliability storage subsystems
Published in Computer (Long Beach, Calif.) (01-03-1994)“…As the performance of other system components continues to improve rapidly, storage subsystem performance becomes increasingly important. Storage subsystem…”
Get full text
Journal Article -
8
Efficient Byzantine-tolerant erasure-coded storage
Published in International Conference on Dependable Systems and Networks, 2004 (2004)“…This paper describes a decentralized consistency protocol for survivable storage that exploits local data versioning within each storage-node. Such versioning…”
Get full text
Conference Proceeding -
9
Dynamic quarantine of Internet worms
Published in International Conference on Dependable Systems and Networks, 2004 (2004)“…If we limit the contact rate of worm traffic, can we alleviate and ultimately contain Internet worms? This paper sets out to answer this question…”
Get full text
Conference Proceeding -
10
PipeFill: Using GPUs During Bubbles in Pipeline-parallel LLM Training
Published 23-09-2024“…Training Deep Neural Networks (DNNs) with billions of parameters generally involves pipeline-parallel (PP) execution. Unfortunately, PP model training can use…”
Get full text
Journal Article -
11
Scheduling speculative tasks in a compute farm
Published in ACM/IEEE SC 2005 Conference (SC'05) (12-11-2005)“…Users often behave speculatively, submitting work that initially they do not know is needed. Farm computing often consists of single node speculative tasks…”
Get full text
Conference Proceeding -
12
Vilamb: Low Overhead Asynchronous Redundancy for Direct Access NVM
Published 20-04-2020“…Vilamb provides efficient asynchronous systemredundancy for direct access (DAX) non-volatile memory (NVM) storage. Production storage deployments often use…”
Get full text
Journal Article -
13
Zzyzx: Scalable fault tolerance through Byzantine locking
Published in 2010 IEEE/IFIP International Conference on Dependable Systems & Networks (DSN) (01-06-2010)“…Zzyzx is a Byzantine fault-tolerant replicated state machine protocol that outperforms prior approaches and provides near-linear throughput scaling. Using a…”
Get full text
Conference Proceeding -
14
Tvarak: Software-managed hardware offload for DAX NVM storage redundancy
Published 26-08-2019“…Tvarak efficiently implements system-level redundancy for direct-access (DAX) NVM storage. Production storage systems complement device-level ECC (which covers…”
Get full text
Journal Article -
15
Co-scheduling of Disk Head Time in Cluster-Based Storage
Published in 2009 28th IEEE International Symposium on Reliable Distributed Systems (01-09-2009)“…Disk time slicing is a promising technique for storage performance insulation. To work with cluster based storage, however, time slices associated with striped…”
Get full text
Conference Proceeding -
16
GraphPipe: Improving Performance and Scalability of DNN Training with Graph Pipeline Parallelism
Published 24-06-2024“…Deep neural networks (DNNs) continue to grow rapidly in size, making them infeasible to train on a single device. Pipeline parallelism is commonly used in…”
Get full text
Journal Article -
17
DeltaFS: A Scalable No-Ground-Truth Filesystem For Massively-Parallel Computing
Published in SC21: International Conference for High Performance Computing, Networking, Storage and Analysis (14-11-2021)“…High-Performance Computing (HPC) is known for its use of massive concurrency. But it can be challenging for a parallel filesystem's control plane to utilize…”
Get full text
Conference Proceeding -
18
PACEMAKER: Avoiding HeART attacks in storage clusters with disk-adaptive redundancy
Published 15-03-2021“…14th USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2020, (pp. 369-385) Data redundancy provides resilience in large-scale storage…”
Get full text
Journal Article -
19
Scaling Embedded In-Situ Indexing with DeltaFS
Published in SC18: International Conference for High Performance Computing, Networking, Storage and Analysis (01-11-2018)“…Analysis of large-scale simulation output is a core element of scientific inquiry, but analysis queries may experience significant I/O overhead when the data…”
Get full text
Conference Proceeding -
20
MLtuner: System Support for Automatic Machine Learning Tuning
Published 20-03-2018“…MLtuner automatically tunes settings for training tunables (such as the learning rate, the momentum, the mini-batch size, and the data staleness bound) that…”
Get full text
Journal Article