Stephen Keckler

Stephen W. Keckler, Ph.D.
Vice President of Research
Stephen Keckler's picture

Steve Keckler joined NVIDIA in 2009 and leads the Architecture Research Group. He is also an Adjunct Professor of Computer Science at the University of Texas at Austin, where he served on the faculty from 1998-2012. His research interests include parallel computer architectures, high-performance computing, energy-efficient architectures, and embedded computing.  Dr. Keckler was previously at the Massachusetts Institute of Technology from 1990 to 1998, where he led the development of the M-Machine experimental parallel computer system. He is a Fellow of the ACM, a Fellow of the IEEE, an Alfred P. Sloan Research Fellow, and a recipient of the NSF CAREER award, the ACM Grace Murray Hopper award, the President's Associates Teaching Excellence Award at UT-Austin, and the Edith and Peter O’Donnell award for Engineering. He earned a B.S. in Electrical Engineering from Stanford University and an M.S. and a Ph.D. in Computer Science from the Massachusetts Institute of Technology. Full list of publications

Research Interests:

Parallel and Serial Computer Architectures, Memory Systems, Interconnection Networks, High-Performance Computing, Low-Power Computing

A Real-time Energy-Efficient Superpixel Hardware Accelerator for Mobile Computer Vision Applications
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors
A Compile-Time Managed Multi-Level Register File Hierarchy
GPUs and the Future of Parallel Computing
Energy-efficient Mechanisms for Managing Thread Context in Throughput Processors
Preemptive Virtual Clock: A Flexible, Efficient and Cost-effective QOS Scheme for Networks-on-a-Chip
An Evaluation of the TRIPS Computer System
Express Cube Topologies for On-Chip Interconnects
Counting Dependence Predictors
High Performance Linear Algebra on a Spatially Distributed Processor
Composable Lightweight Processors
Reconciling Performance and Programmability in Networking Systems
An Adaptive, Non-Uniform Cache Structure for Wire-Delay Dominated On-Chip Caches