Research Labs
All Research Labs
3D Deep Learning
Applied Research
Autonomous Vehicles
Deep Imagination
Publications
AI Playground
New and Featured
AI Art Gallery
NGC Demos
Research Areas
AI & Machine Learning
3D Deep Learning
Computer Vision
Robotics
All Areas
Careers
Academic Collaborations
Government Collaborations
Graduate Fellowship
Internships
Research Openings
Research Scientists
Meet the Team
Licensing
Skip to main content
People
David Nellans
Dave Nellans joined NVIDIA in 2013 and leads the Architecture Research Group.
Research Area(s)
Artificial Intelligence and Machine Learning
Computer Architecture
High Performance Computing
Hyperscale Graphics
Programming Languages, Systems and Tools
Storage and Systems
Main Field of Interest
Computer Architecture
Google Scholar
https://scholar.google.com/citations?user=mjvx1GIAAAAJ&hl=en
Publications
2023
Parsimony: Enabling SIMD/Vector Programming in Standard Compiler Flows
Vijay Kandiah,
Daniel Lustig
,
Oreste Villa
,
David Nellans
, Nikos Hardavellas
International Symposium on Code Generation and Optimization
2022
The Implications of Page Size Management on Graph Analytics
Aninda Manocha,
Zi Yan
, Esin Tureci, Juan Luis Aragón,
David Nellans
, Margaret Martonosi
International Symposium on Workload Characterization (IISWC)
2021
GPU Domain Specialization via Composable On-Package Architecture
Yaosheng Fu
, Evgeny Bolotin,
Niladrish Chatterjee
,
David Nellans
,
Steve Keckler
ACM Transactions on Architecture and Code Optimization (TACO)
GPS: A Global Publish-Subscribe Model for Multi-GPU Memory Management
Harini Muthukrishnan
,
Daniel Lustig
,
David Nellans
, Thomas Wenisch
International Symposium on Microarchitecture (MICRO)
Best Paper nominee, IEEE Micro Top Picks in Computer Architecture (Honorable Mention)
Efficient Multi-GPU Shared Memory via Automatic Optimization of Fine-Grained Transfers
Harini Muthukrishnan
,
David Nellans
,
Daniel Lustig
, Jeffrey Fessler, Thomas Wenisch
International Symposium on Computer Architecture (ISCA)
GPU Domain Specialization via Composable On-Package Architecture
Yaosheng Fu
, Evgeny Bolotin,
Niladrish Chatterjee
,
David Nellans
,
Steve Keckler
arXiv
Need for Speed: Experiences Building a Trustworthy System-Level GPU Simulator.
Oreste Villa
,
Daniel Lustig
,
Zi Yan
, Evgeny Bolotin,
Yaosheng Fu
,
Niladrish Chatterjee
,
Ted Jiang
,
David Nellans
International Symposium on High Performance Computer Architecture (HPCA)
2020
The Architectural Implications of Distributed Reinforcement Learning on CPU-GPU Systems
Ahmet Inci, Evgeny Bolotin,
Yaosheng Fu
,
Gal Dalal
,
Shie Mannor
,
David Nellans
, Diana Marculescu
Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2)
Locality-Centric Data and Threadblock Management for Massive GPUs
Mahmoud Khairy, Vadim Nikiforov,
David Nellans
, Timothy G. Rogers
International Symposium on Microarchitecture (MICRO)
Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs
Esha Chouske,
Michael B. Sullivan
,
Mike O'Connor
, Mattan Erez, Jeff Pool,
David Nellans
,
Steve Keckler
International Symposium on Computer Architecture (ISCA)
HMG: Extending Cache Coherence Protocols Across Modern Hierarchical Multi-GPU Systems
Xiaowei Ren,
Daniel Lustig
, Evgeny Bolotin,
Aamer Jaleel
,
Oreste Villa
,
David Nellans
International Symposium on High Performance Computer Architecture (HPCA)
2019
NVBit: A Dynamic Binary Instrumentation Framework for NVIDIA GPUs
Oreste Villa
,
Mark Stephenson
,
David Nellans
,
Steve Keckler
International Symposium on Microarchitecture (MICRO)
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar,
Yaosheng Fu
, Victor Zhang, Szymon Migacz,
David Nellans
, Puneet Gupta
IEEE MICRO: Special Edition on Machine Learning Acceleration
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal, Eiman Ebrahimi, Arslan Zulfiqar,
Yaosheng Fu
, Victor Zhang, Szymon Migacz,
David Nellans
, Puneet Gupta
arXiv
Translation Ranger: Operating System Support for Contiguity-Aware TLBs
Zi Yan
,
Daniel Lustig
,
David Nellans
, Abhishek Bhattacharjee
International Symposium on Computer Architecture (ISCA)
Buddy Compression: Enabling Larger Memory for Deep Learning and HPC Workloads on GPUs
Esha Choukse,
Michael B. Sullivan
,
Mike O'Connor
, Mattan Erez, Jeff Pool,
David Nellans
, Stephen W. Keckler
arXiv
Nimble Page Management for Tiered Memory Systems
Zi Yan
,
Daniel Lustig
,
David Nellans
, Abhishek Bhattacharjee
International Conference on Architectural Support for Programming Languages and…
Understanding the Future of Energy Efficiency in Multi-Module GPUs.
Akhil Arunkumar, Evgeny Bolotin,
David Nellans
, Carole-Jean Wu
International Symposium on High Performance Computer Architecture (HPCA)
2018
Combining HW/SW Mechanisms to Improve NUMA Performance of Multi-GPU Systems
Vinson Young,
Aamer Jaleel
, Evgeny Bolotin, Eiman Ebrahimi,
David Nellans
,
Oreste Villa
International Symposium on Microarchitecture (MICRO)
2017
Beyond the Socket: NUMA-Aware GPUs
Ugljesa Milic,
Oreste Villa
, Evgeny Bolotin, Akhil Arunkumar, Eiman Ebrahimi,
Aamer Jaleel
, Alex Ramirez,
David Nellans
International Symposium on Microarchitecture (MICRO)
MCM-GPU: Multi-Chip-Module GPUs for Continued Performance Scalability
Akhil Arunkumar , Evgeny Bolotin, Benjamin Cho, Ugljesa Milic , Eiman Ebrahimi,
Oreste Villa
,
Aamer Jaleel
, Carole-Jean Wu ,
David Nellans
International Symposium on Computer Architecture (ISCA)
2016
Towards High Performance Paged Memory for GPUs
Tianhao Zheng,
David Nellans
, Arslan Zulfiqar,
Mark Stephenson
,
Steve Keckler
International Symposium on High Performance Computer Architecture (HPCA)
Selective GPU Caches to Eliminate CPU-GPU HW Cache Coherence
Neha Agarwal,
David Nellans
, Eiman Ebrahimi, Thomas F. Wenisch, John Danskin,
Steve Keckler
International Symposium on High Performance Computer Architecture (HPCA)
2015
Designing Efficient Heterogeneous Memory Architectures
Evgeny Bolotin,
David Nellans
,
Oreste Villa
,
Mike O'Connor
, Alex Ramirez,
Steve Keckler
,
Mike O'Connor
IEEE Micro
Flexible Software Profiling of GPU Architectures
Mark Stephenson
,
Siva Hari
, Yunsup Lee, Eiman Ebrahimi, Daniel Johnson,
David Nellans
,
Mike O'Connor
,
Steve Keckler
International Symposium on Computer Architecture (ISCA)
Page Placement Strategies for GPUs within Heterogeneous Memory Systems
Neha Agarwal,
David Nellans
,
Mark Stephenson
,
Mike O'Connor
,
Steve Keckler
International Conference on Architectural Support for Programming Languages and…
Unlocking Bandwidth for GPUs in CC-NUMA systems
Neha Agarwal,
David Nellans
,
Mike O'Connor
,
Steve Keckler
, Thomas Wenisch
International Symposium on High Performance Computer Architecture (HPCA)
2014
Scaling the Power Wall: A Path to Exascale
Oreste Villa
, Daniel Johnson,
Mike O'Connor
, Evgeny Bolotin,
David Nellans
, Justin Luitjens, Nikolai Sakharnykh, Peng Wang, Paulius Micikevicius, Anthony Scudiero,
Steve Keckler
,
William Dally
SC '14