AtomThink framework for multimodal reasoning

Can Atomic Step Decomposition Enhance the Self-structured Reasoning of Multimodal Large Models? (arXiv)

AtomThink introduces four key modules — a data engine, atomic step fine-tuning, policy-guided multi-turn inference, and an atomic capability metric — to bring structured slow-thinking capabilities into visual understanding models, while improving data utilization by 5× and inference efficiency by 85.3% over state-of-the-art structured CoT methods.

Kun Xiang, Zhili Liu, Zihao Jiang, Yunshuang Nie, Kaixin Cai, Yiyang Yin, Runhui Huang, Haoxiang Fan, Hanhui Li, Weiran Huang, Yihan Zeng, Yu-Jie Yuan, Jianhua Han, Lanqing Hong, Hang Xu, Xiaodan Liang (2025)