Composer 2: Build Autonomous Coding Agents with Two-Phase LLM Training

39 views· 5:56· Apr 26, 2026

ShareTwitter Facebook LinkedIn Instagram

🛍️ Products Mentioned (1)

Composer 2 Technical Report

Available on arxiv →

Composer 2 Technical Report: https://arxiv.org/abs/2603.24477 Composer 2 is a specialized model designed for agentic software engineering. The model demonstrates strong long-term planning and coding intelligence while maintaining the ability to efficiently solve problems for interactive use. The model is trained in two phases: first, continued pretraining to improve the model's knowledge and latent coding ability, followed by large-scale reinforcement learning to improve end-to-end coding performance through stronger reasoning, accurate multi-step execution, and coherence on long-horizon realistic coding problems. We develop infrastructure to support training in the same Cursor harness that is used by the deployed model, with equivalent tools and structure, and use environments that match real problems closely. To measure the ability of the model on increasingly difficult tasks, we introduce a benchmark derived from real software engineering problems in large codebases including our own. Composer 2 is a frontier-level coding model and demonstrates a process for training strong domain-specialized models. On our CursorBench evaluations the model achieves a major improvement in accuracy compared to previous Composer models (61.3). On public benchmarks the model scores 61.7 on Terminal-Bench and 73.7 on SWE-bench Multilingual in our harness, comparable to state-of-the-art systems. Composer 2 is an autonomous software engineering agent designed for multi-step code generation across large codebases. This breakdown covers its two-phase training architecture combining large-scale pre-training for coding knowledge with reinforcement learning for long-horizon planning and execution. It explains how Cursor Research built a realistic IDE-aligned training harness to reduce distribution shift and improve reliability. The system enables low-latency interactive coding while maintaining deep reasoning across complex engineering tasks. Benchmarks show strong performance on CursorBench and TerminalBench, demonstrating improved multi-file execution, planning accuracy, and scalable AI coding workflows for advanced repository-wide automation. TimeStamps: 0:00 Autonomous Software Engineering Agents 0:17 Latency vs Long-Term Reasoning Tradeoff 0:31 Composer 2 System Overview 0:56 Two-Phase Training Architecture 1:15 Knowledge Acquisition vs Behavioral Execution 1:37 Distribution Shift in AI Training 2:00 Phase One Pre-Training for Code Knowledge 2:33 Reinforcement Learning for Planning 3:01 Multi-Step Execution and Long-Horizon Tasks 3:24 Reward Hacking and Training Risks 🤖 autonomous coding agents and repository-wide AI 🧠 two-phase training architecture for LLMs ⚡ reinforcement learning for multi-step execution 📊 distribution shift and real-world deployment 💻 IDE-aligned training environments and harness 📈 benchmark performance and coding accuracy 🔗 multi-file reasoning and planning systems ⚙️ scalable AI software engineering workflows 🧩 bridging knowledge and execution in AI High-performing AI coding systems require structured training, not just larger models. Separating knowledge acquisition from execution enables reliable multi-step planning, reduces failure rates, and improves scalability. Engineers leveraging reinforcement learning and environment-aligned training gain stronger automation, faster development cycles, and higher accuracy across complex codebases. #AICoding #SoftwareAgents #LLMTraining

Watch on YouTube