🎯
    Focusing
    Master's student @ THU
- 
                  Tsinghua University
- Qingdao
- 
        
  18:01
  (UTC +08:00) 
- https://ryanliu112.github.io
- https://scholar.google.com/citations?user=LiIfGakAAAAJ
Highlights
- Pro
Pinned Loading
- 
  TsinghuaC3I/Awesome-RL-for-LRMsTsinghuaC3I/Awesome-RL-for-LRMs PublicA Survey of Reinforcement Learning for Large Reasoning Models 
- 
  TsinghuaC3I/MARTITsinghuaC3I/MARTI PublicA Framework for LLM-based Multi-Agent Reinforced Training and Inference 
- 
  compute-optimal-ttscompute-optimal-tts PublicOfficial codebase for "Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling". 
- 
  Awesome-Process-Reward-ModelsAwesome-Process-Reward-Models PublicA comprehensive collection of process reward models. 
- 
  wizard-III/ArcherCodeRwizard-III/ArcherCodeR PublicArcherCodeR is an open-source initiative enhancing code reasoning in large language models through scalable, rule-governed reinforcement learning. 
          Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
  If the problem persists, check the GitHub status page or contact support.
