🧠 ReasonFlux Series
Advanced Open-Source LLM Post-Training Suite
Princeton University & PKU & UIUC & University of Chicago & ByteDance Seed
🎯 Mission: Building next-generation reasoning capabilities through innovative LLM post-training algorithms focusing on data selection, reinforcement learning, and inference scaling.
Contents of Repository
🚀 What Makes ReasonFlux Series Special?
1. Trajectory-Aware Process Reward Models for Long-CoT Reasoning (ReasonFlux-PRM, NeurIPS 2025)
Trajectory-aware reward models that provide dense supervision for both offline data selection and online policy optimization in long-CoT reasoning.
2. Co-Evolved RL for LLM Coder and Unit Tester (ReasonFlux-Coder, NeurIPS 2025 Spotlight)
Innovative approach where coders and unit testers evolve together through reinforcement learning, creating more robust coding capabilities.