AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
Published:
Paper Model weights (coming soon) 🤗 Training data (coming soon) 🤗 Benchmark (coming soon)🤗
Author: Zihan Liu*, Yang Chen*, Mohammad Shoeybi, Bryan Catanzaro, Wei Ping
* Equal contribution
Posted: Wei Ping
Overview
We introduce AceMath, a family of frontier math reasoning models that set new state-of-the-art accuracy on math reasoning benchmarks. AceMath outperforms both leading open-access models (e.g., Qwen2.5-Math-72B-Instruct) and proprietary models (e.g., GPT-4o (2024-08-06) and Claude 3.5 Sonnet (2024-10-22)).
We compare AceMath to leading proprietary and open-access math models in above Table. Our AceMath-7B-Instruct, largely outperforms the previous best-in-class Qwen2.5-Math-7B-Instruct (Average pass@1: 67.2 vs. 62.9) on a variety of math reasoning benchmarks (detailed results in Figure 1), while coming close to the performance of 10× larger Qwen2.5-Math-72B-Instruct (67.2 vs. 68.2). Notably, our AceMath-72B-Instruct outperforms the state-of-the-art Qwen2.5-Math-72B-Instruct (71.8 vs. 68.2), GPT-4o (67.4) and Claude 3.5 Sonnet (65.6) by a margin. We also report the rm@8 accuracy (best of 8) achieved by our reward model, AceMath-72B-RM, which sets a new record on these reasoning benchmarks. This excludes OpenAI’s o1 model, which relies on scaled inference computation.
Technical highlights
Here are the key technical highlights of our work:
- We introduce a SFT process designed to first achieve competitive performance across general domains, including multidisciplinary topics, coding, and math. Building on this, the general SFT model is further fine-tuned in math domain using a meticulously curated set of prompts and synthetically generated responses.
- We conducted a systematic investigation of training techniques for building math-specialized reward models, focusing on key aspects such as the construction of positive-negative pairs, training objectives, and the elimination of stylistic biases from specific LLMs.
- We will open source the model weights for AceMath-Instruct and AceMath-RM, along with the complete training data used across all stages of their development.
- We also release AceMath-RewardBench, a comprehensive benchmark for evaluating math reward models, offering diverse datasets, varying difficulty levels, and robustness to variations in response styles.
Citation
@article{acemath2024, title={AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling}, author={Liu, Zihan and Chen, Yang and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei}, journal={arXiv preprint}, year={2024} }