Skip to main content

R1 Zero GRPO Resources

Tiu MoAbout 1 min

R1 Zero GRPO Resources

A curated collection of resources related to R1 Zero and GRPO (Generative Reward-Penalty Optimization) implementations and research.

Official Implementations

Training Tools & Frameworks

T2I-R1: Reinforcing Image Generation with Collaborative Semantic-level and Token-level CoT https://github.com/CaraJ7/T2I-R1

Thinking with Images

DeepEyes: Incentivizing "Thinking with Images" via Reinforcement Learning
paperopen in new windowprojectopen in new windowcodeopen in new windowwechatopen in new window

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning projectopen in new windowpaperopen in new windowcodeopen in new window

VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual‘Tool Selection

Thinking with Generated Images

•SAM-R1: Leveraging SAM for Reward Feedback in Multimodal Segmentation via Reinforcement Learning

•Omni-R1: Reinforcement Learning for Omnimodal Reasoning via Two-System Collaboration

•GRIT: Teaching MLLMs to Think with Images

•Chain-of-Focus: Adaptive Visual Search and Zooming for Multimodal Reasoning via RL

•Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

•UniVG-R1: Reasoning Guided Universal Visual Grounding with Reinforcement Learning

•OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning

•Perception-R1: Pioneering Perception Policy with Reinforcement Learning

Documentation

Additional Resources

https://github.com/qiwang067/awesome-visual-rl

https://github.com/datawhalechina/easy-rl?tab=readme-ov-file 强化学习教程

训练框架

https://github.com/Simple-Efficient/RL-Factory

物理规则推理模型 https://github.com/nvidia-cosmos/cosmos-reason1