1. [Publications](/publications)
2. SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding
 
 # SPEED-Bench: A Unified and Diverse Benchmark for Speculative Decoding

  ![Publication image](/sites/default/files/styles/wide/public/default_images/default.jpeg?itok=qUFsuJCP "Publication image")

 Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and representative workloads are essential for accurately measuring its effectiveness. Existing benchmarks suffer from limited task diversity, inadequate support for throughput-oriented evaluation, and a reliance on high-level implementations that fail to reflect production environments. To address this, we introduce SPEED-Bench, a comprehensive suite designed to standardize SD evaluation across diverse semantic domains and realistic serving regimes. SPEED-Bench offers a carefully curated qualitative data split, selected by prioritizing semantic diversity across the data samples. Additionally, it includes a Throughput data split, allowing speedup evaluation across a range of concurrencies, from latency-sensitive low-batch settings to throughput-oriented high-load scenarios. By integrating with production engines like vLLM and TensorRT-LLM, SPEED-Bench allows practitioners to analyze system behaviors often masked by other benchmarks. We highlight this by quantifying how synthetic inputs overestimate real-world throughput, identifying batch-size dependent optimal draft lengths and biases in low-diversity data, and analyzing the caveats of vocabulary pruning in state-of-the-art drafters. We release SPEEDBench to establish a unified evaluation standard for practical comparisons of SD algorithms.


 ## Authors


Talor Abramovich (NVIDIA)

Maor Ashkenazi (NVIDIA)

Carl Putterman (NVIDIA)

Benjamin Chislett (NVIDIA)

Tiyasa Mitra (NVIDIA)

Bita Darvish Rouhani (NVIDIA)

Ran Zilberstein (NVIDIA)

Yonatan Geifman (NVIDIA)

 
 ## Publication Date


Monday, February 23, 2026

 
 ## Research Area


[Artificial Intelligence and Machine Learning ](/research-area/machine-learning-artificial-intelligence)

 
 ## Uploaded Files


[SPEED\_Bench\_Paper.pdf](https://d1qx31qr3h6wln.cloudfront.net/publications/SPEED_Bench_Paper.pdf "Open file in new window")1.06 MB