We propose SDAR (Synergy of Diffusion and AutoRegression), a large-scale
diffusion language model that unites the complementary strengths of autoregressive and discrete diffusion
modeling.
We have open-sourced the model weights for our dense models
(1.7B,
4B,
8B)
and for our 30B MoE model
(SDAR-30B-A3B-Chat
and SDAR-30B-A3B-Sci).
Figure 1. Accuracy–speedup under static vs. dynamic inference; dynamic threshold sweeps relative to static.
Benchmark | SDAR-1.7B-Chat | SDAR-4B-Chat | SDAR-8B-Chat | SDAR-30B-A3B-Chat | LLADA-8B | Dream-7B | Qwen3-1.7B-Base | Qwen3-1.7B-AR-SFT | Qwen3-30B-Base | Qwen3-30B-AR-SFT |
---|---|---|---|---|---|---|---|---|---|---|
MMLU | 62.9 (-0.9) | 74.9 | 78.6 | 82.8 (+0.6) | 65.9 | 69.5 | 62.6 | 63.8 | 81.4 | 82.2 |
GSM8K | 80.1 (-1.0) | 89.9 | 91.3 | 91.4 (-1.3) | 78.6 | 81.0 | 75.4 | 81.1 | 91.8 | 92.7 |
Math500 | 63.2 (+1.2) | 72.8 | 78.6 | 77.8 (+1.0) | – | – | 43.5 | 62.0 | 59.0 | 76.8 |
MathBench | 63.6 (+3.1) | 74.7 | 76.9 | 79.3 (+0.9) | – | – | – | 60.5 | – | 78.4 |
HumanEval | 61.6 (-4.3) | 72.0 | 78.7 | 87.2 (+2.4) | 47.6 | 55.5 | – | 65.9 | – | 84.8 |
MBPP | 61.1 (-0.8) | 65.4 | 72.0 | 71.6 (-3.5) | 34.2 | 58.8 | 55.4 | 61.9 | 74.4 | 75.1 |
IFEval | 43.4 (+0.1) | 56.6 | 61.4 | 60.6 (+2.9) | 59.9 | 62.5 | – | 43.3 | – | 57.7 |
Benchmark | AR-30B-A3B-Sci | SDAR-30B-A3B-Sci (greedy) | SDAR-30B-A3B-Sci (sample) | Intern-S1(235B-A22B) | InternVL3-78B | Qwen2.5-VL-72B | DeepSeek-R1-0528 | Qwen3-235B-A22B | Kimi-K2-Instruct | Gemini-2.5 Pro | o3 | Grok-4 |
---|---|---|---|---|---|---|---|---|---|---|---|---|
MMLU-pro | 78.3 | 80.2 (+1.9) | 80.6 (+2.3) | 83.5 | 73.0 | 72.1 | 83.4 | 82.2 | 82.7 | 86.0 | 85.0 | 85.9 |
GPQA-diamond | 61.2 | 73.7 (+12.5) | 71.8 (+10.6) | 77.3 | 49.9 | 49.0 | 80.6 | 71.1 | 77.8 | 83.8 | 83.3 | 87.5 |
AIME2024 | 74.9 | 73.3 (-1.6) | 76.2 (+1.3) | – | – | – | – | – | – | – | – | – |
AIME2025 | 60.7 | 63.3 (+2.6) | 62.2 (+1.5) | 86.0 | 10.7 | 10.9 | 87.5 | 81.5 | 51.4 | 83.0 | 88.9 | 91.7 |
LiveMathBench-hard | 55.4 | 60.7 (+5.3) | 57.9 (+2.5) | – | – | – | – | – | – | – | – | – |
LiveCodeBench-v5 | 51.5 | 40.7 (-10.8) | 49.1 (-2.4) | – | – | – | – | – | – | – | – | – |
LiveCodeBench-v6 | 46.3 | 42.3 (-4.0) | 51.4 (+5.1) | – | – | – | – | – | – | – | – | – |
ChemBench | 60.5 | 75.1 (+14.6) | 75.1 (+14.6) | 83.4 | 61.3 | 61.6 | 75.6 | 75.8 | 75.3 | 82.8 | 81.6 | 83.3 |
PHYSICS | 39.0 | 52.9 (+13.9) | 55.6 (+16.6) | 44.0 | 23.1 | 15.7 | – | – | – | 40.0 | 47.9 | 42.8 |
ProteinLMBench | 59.5 | 60.7 (+1.2) | 60.0 (+0.5) | 63.1 | 61.6 | 61.0 | 61.4 | 59.8 | 66.7 | 62.9 | 67.7 | 66.2 |
@misc{JetAstra2025,
title={SDAR: A Synergistic Diffusion–AutoRegression Paradigm for Scalable Sequence Generation},
author={Shuang Cheng and Yihan Bian and Dawei Liu and Yuhua Jiang and Yihao Liu and Linfeng Zhang and Wenhai Wang and Qipeng Guo and Kai Chen and Biqing Qi and Bowen Zhou},
year={2025},
institution={Shanghai AI Lab},
url={https://github.com/JetAstra/SDAR}
}