QCalEval: Benchmarking Vision-Language Models for Quantum Calibration Plot Understanding

Quantum computing calibration depends on interpreting experimental data, and calibration plots provide the most universal human-readable representation for this task, yet no systematic evaluation exists of how well vision-language models (VLMs) interpret them. We introduce QCalEval, the first VLM benchmark for quantum calibration plots: 243 samples across 87 scenario types from 22 experiment families, spanning superconducting qubits and neutral atoms, evaluated on six question types in both zero-shot and in-context learning settings. The best general-purpose zero-shot model reaches a mean score of 72.3, and many open-weight models degrade under multi-image in-context learning, whereas frontier closed models improve substantially. A supervised fine-tuning ablation at the 9-billion-parameter scale shows that supervision format is critical, zero-shot-formatted and in-context-learning-formatted fine-tuning improve different capabilities, and no single recipe improves open-ended analysis. As a reference case study, we release NVIDIA Ising Calibration 1, an open-weight model based on Qwen3.5-35B-A3B that reaches 74.7 zero-shot average score.

Authors

Shuxiang Cao (NVIDIA)

Zijian Zhang (NVIDIA, University of Toronto, Vector Institute for Artificial Intelligence)

Abhishek Agarwal (National Physical Laboratory)

Grace Bratrud (Northwestern University, Fermi National Accelerator Laboratory)

Niyaz R. Beysengulov (EeroQ Corporation)

Daniel C. Cole (Infleqtion)

Alejandro Gomez Frieiro (IQM Quantum Computers)

Elena O. Glen (EeroQ Corporation)

Hao Hsu (IQM Quantum Computers)

Gang Huang (Lawrence Berkeley National Laboratory)

Raymond Jow (Conductor Quantum)

Greshma Shaji (IQM Quantum Computers)

Tom Lubowe (NVIDIA)

Luis Mantilla Calderon (NVIDIA)

Nicola Pancotti (NVIDIA)

Joel Pendleton (Conductor Quantum)

Brandon Severin (Conductor Quantum)

Charles Etienne Staub (Harvard University)

Sara Sussman (Fermi National Accelerator Laboratory)

Antti Vepsäläinen (IQM Quantum Computers)

Neel Rajeshbhai Vora (Lawrence Berkeley National Laboratory)

Yilun Xu (Lawrence Berkeley National Laboratory)

Varinia Bernales (University of Toronto)

Daniel Bowring (Fermi National Accelerator Laboratory)

Elica Kyoseva (NVIDIA)

Ivan Rungger (National Physical Laboratory, Royal Holloway University of London)

Giulia Semeghini (Harvard University)

Sam Stanwyck (NVIDIA)

Timothy Costa (NVIDIA)

Alán Aspuru-Guzik

Krysta Svore (NVIDIA)

Publication Date

Tuesday, April 14, 2026

Research Area

Computer Vision

External Links

QCalEval benchmark dataset

Evaluation Scripts

NVIDIA Ising Calibration 1 model weights

Uploaded Files

QCalEval Benchmarking Vision-Language Models.pdf3.75 MB