Requirements & tools
Anthropic
The Research Engineer, Code Reinforcement Learning at Anthropic advances Claude’s ability to write, edit, test, debug, and ship real software end-to-end. Responsibilities include designing RL environments and coding tasks; building the reward signals and verifiers that capture what good code means; running training experiments on frontier models; diagnosing why a model does (or doesn’t) improve at specific classes of software-engineering work; and improving the speed and reliability of pipelines that enable rapid experimentation. Code RL spans several focus areas — from agentic coding behaviors and code correctness, to long-horizon autonomous engineering, to high-performance code for accelerators — with team matching candidates to highest-impact areas. Required: strong software-engineering skills with deep Python expertise (including async/concurrent programming); comfort owning systems end-to-end and debugging across the stack; balance between research exploration and engineering implementation; rigor in shaping experimental design and interpreting results; care for code quality, testing, and performance; passion for safe and beneficial AI systems. Strong candidates also have experience with RL, RLHF, post-training, or LLM fine-tuning; or have built coding agents, code-execution sandboxes, or similar systems. Compensation: $500,000-$850,000 base plus substantial equity (Anthropic stock).
Role context
Research Engineers on the Code Reinforcement Learning team at frontier AI labs advance models’ ability to write, edit, test, debug, and ship real software end-to-end on real codebases with real tools. At Anthropic, this role within the RL organization blends research and engineering — designing RL environments and coding tasks, building reward signals and verifiers that capture what “good code” means, running training experiments on frontier models, and diagnosing why a model does or doesn’t improve at a class of software-engineering work. The team contributes to every Claude model release with significant impacts on coding capabilities. Compensation reflects frontier-AI-research pay banding at $500,000-$850,000 plus equity.
Quick facts
Frequently Asked Questions
What does "Code RL" focus on at Anthropic specifically?
Code Reinforcement Learning at Anthropic spans several focus areas: agentic coding behaviors (models that plan multi-step software changes), code correctness (verifying the code actually works), long-horizon autonomous engineering (models that work on weeks-long projects), and high-performance code for accelerators (CUDA, Triton, Pallas optimization). The team matches engineers to whichever subarea has highest leverage at hiring time.
Is prior RL experience required for this Code RL role?
Listed as a strong bonus, not strictly required. The harder requirement is strong software-engineering with deep Python expertise. Engineers from backgrounds in compilers, language tooling, code execution, or developer infrastructure are competitive without RL specifics. The first 3-6 months involve significant RL ramp-up; candidates who can demonstrate fast learning and quantitative rigor compensate for missing RL background.
What is Anthropic's compensation philosophy for senior research engineers?
Compensation at Anthropic for senior research engineers is among the highest in the industry — $500,000-$850,000 base reflects this. Total compensation including equity often exceeds $1M for highly experienced candidates. The philosophy is to compete with top labs (OpenAI, DeepMind) and frontier hedge funds for the same talent. Compensation reflects scarcity of senior RL + LLM engineering talent globally.