Skip to main content
Research
Search form
Search
Publications
Research Areas
People
About
Collaborations
Academic
AI Research Residency
Government
Graduate Fellowships
Careers
Internships
Research Scientists
Research Areas
Programming Languages, Systems and Tools
Associated Publications
Scaling Implicit Parallelism via Dynamic Control Replication
A Programmable Approach to Neural Network Compression
Zeroploit: Exploiting Zero Valued Operands in Interactive Gaming Applications
Speculative Reconvergence for Improved SIMT Efficiency
Legate NumPy: Accelerated and Distributed Array Computing
NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs
Throughput-oriented GPU memory allocation
Optimizing Software-Directed Instruction Replication for GPU Error Detection
Kayotee: A Fault Injection-based System to Assess the Safety and Reliability of Autonomous Vehicles to Faults and Errors
Relaxations for High-Performance Message Passing on Massively Parallel SIMT Processors
TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA
Automated Synthesis of Comprehensive Memory Model Litmus Test Suites
MemcachedGPU: Scaling-up Scale-out Key-value Stores
Flexible Software Profiling of GPU Architectures
Exploring the Design Space of SPMD Divergence Management on Data-Parallel Architectures
NOVA: A Functional Language for Data Parallelism
Policy-based Tuning for Performance Portability and Library Co-optimization
Processing Device Arrays with C++ Metaprogramming
Copperhead: Compiling an Embedded Data Parallel Language
Programming Massively Parallel Processors: A Hands-on Approach
Parallel Computing Experiences with CUDA
Scalable Parallel Programming with CUDA
Researchers
Aamer Jaleel
Aaron Lefohn
Albert Sidelnik
Benjamin Klenk
Cris Cecka
Daniel Lustig
David Nellans
Isaac Gelado
Mark Kilgard
Mark Stephenson
Michael Bauer
Michael Garland
Oreste Villa
Saurav Muralidharan
Scott Mahlke
Sean Treichler
Siva Hari
Steven Dalton
Tim Foley
Wen-mei Hwu
William Dally
Yaosheng Fu
Yong He