This is the official code base for the paper RLVR-World: Training World Models with Reinforcement Learning.
Give it a star 🌟 if you find our work useful!
- 🚩 2025.10.28: NeurIPS 2025 camera-ready version is released on arXiv.
 - 🚩 2024.09.18: RLVR-World has been accepted by NeurIPS 2025, congrats!
 - 🚩 2025.05.26: We release all models and datasets.
 - 🚩 2025.05.21: We open-source our training codes.
 - 🚩 2025.05.21: Our paper is released on arXiv.
 
We pioneer training world models through RLVR:
- World models across various modalities (particularly, language and videos) are unified under a sequence modeling formulation;
 - Task-specific prediction metrics serve as verifiable rewards directly optimized by RL.
 
At the moment, we provide the following models and datasets:
| Modality | Type | Domain | Name | 
|---|---|---|---|
| Language | Dataset | Text game | bytesized32-world-model-cot | 
| Language | World model | Text game | bytesized32-world-model-sft | 
| Language | World model | Text game | bytesized32-world-model-rlvr-binary-reward | 
| Language | World model | Text game | bytesized32-world-model-rlvr-task-specific-reward | 
| Language | Dataset | Web navigation | webarena-world-model-cot | 
| Language | World model | Web navigation | webarena-world-model-sft | 
| Language | World model | Web navigation | webarena-world-model-rlvr | 
| Video | Tokenizer | Robot manipulation | rt1-frame-tokenizer | 
| Video | World model | Robot manipulation | rt1-world-model-single-step-base | 
| Video | World model | Robot manipulation | rt1-world-model-single-step-rlvr | 
| Video | Tokenizer | Robot manipulation | rt1-compressive-tokenizer | 
| Video | World model | Robot manipulation | rt1-world-model-multi-step-base | 
| Video | World model | Robot manipulation | rt1-world-model-multi-step-rlvr | 
See lang_wm:
- Text game state prediction
 - Web page state prediction
 - Application: Model predictive control for web agents
 
See vid_wm:
- Robot manipulation trajectory prediction
 - Application: Real2sim policy evaluation
 
- Video world model with RLVR
 - Pre-trained & post-trained video world model weights
 - Real2sim policy evaluation with video world models
 - Text game SFT data
 - Web page SFT data
 - Language world model on text games with RLVR
 - Language world model on web pages with RLVR
 - Post-trained language world model weights
 - Web agents with language world models
 
If you find this project useful, please cite our paper as:
@inproceedings{wu2025rlvr,
    title={RLVR-World: Training World Models with Reinforcement Learning}, 
    author={Jialong Wu and Shaofeng Yin and Ningya Feng and Mingsheng Long},
    booktitle={Advances in Neural Information Processing Systems},
    year={2025},
}
If you have any questions, please contact wujialong0229@gmail.com.
We sincerely appreciate the following github repos for their valuable codebase we build upon:

