Research Engineer — Code Reinforcement Learning

Date Posted
TodayNew!
Remote Work Level
Hybrid Remote
Location
San Francisco, CA
Salary
$500,000 - $850,000
Job Schedule
Full-time
Benefits
Substantial equity (Anthropic stock)Comprehensive health, dental, vision401(k) with employer matchGenerous PTOLearning and development support
Apply Now →View original posting →

Requirements & tools

Education
BS, MS, or PhD in Computer Science, ML, or equivalent
Tools & systems
Python (async, concurrent)PyTorch (RL training)RL frameworks (custom + open source)Cloud infrastructure (AWS, GCP)Evaluation and verifier systems

Anthropic

The Research Engineer, Code Reinforcement Learning at Anthropic advances Claude’s ability to write, edit, test, debug, and ship real software end-to-end. Responsibilities include designing RL environments and coding tasks; building the reward signals and verifiers that capture what good code means; running training experiments on frontier models; diagnosing why a model does (or doesn’t) improve at specific classes of software-engineering work; and improving the speed and reliability of pipelines that enable rapid experimentation. Code RL spans several focus areas — from agentic coding behaviors and code correctness, to long-horizon autonomous engineering, to high-performance code for accelerators — with team matching candidates to highest-impact areas. Required: strong software-engineering skills with deep Python expertise (including async/concurrent programming); comfort owning systems end-to-end and debugging across the stack; balance between research exploration and engineering implementation; rigor in shaping experimental design and interpreting results; care for code quality, testing, and performance; passion for safe and beneficial AI systems. Strong candidates also have experience with RL, RLHF, post-training, or LLM fine-tuning; or have built coding agents, code-execution sandboxes, or similar systems. Compensation: $500,000-$850,000 base plus substantial equity (Anthropic stock).

Role context

Research Engineers on the Code Reinforcement Learning team at frontier AI labs advance models’ ability to write, edit, test, debug, and ship real software end-to-end on real codebases with real tools. At Anthropic, this role within the RL organization blends research and engineering — designing RL environments and coding tasks, building reward signals and verifiers that capture what “good code” means, running training experiments on frontier models, and diagnosing why a model does or doesn’t improve at a class of software-engineering work. The team contributes to every Claude model release with significant impacts on coding capabilities. Compensation reflects frontier-AI-research pay banding at $500,000-$850,000 plus equity.

Quick facts

State employment
35,000
Min experience
7 years
Hiring cycle
45 days
Top skills
RL environment design for code tasksReward signal and verifier developmentFrontier model training experimentationAsync/concurrent Python programmingCross-stack debugging (data pipelines to model training)
Apply Now →
Submit your application in under 2 minutes

Frequently Asked Questions

What does "Code RL" focus on at Anthropic specifically?

Code Reinforcement Learning at Anthropic spans several focus areas: agentic coding behaviors (models that plan multi-step software changes), code correctness (verifying the code actually works), long-horizon autonomous engineering (models that work on weeks-long projects), and high-performance code for accelerators (CUDA, Triton, Pallas optimization). The team matches engineers to whichever subarea has highest leverage at hiring time.

Is prior RL experience required for this Code RL role?

Listed as a strong bonus, not strictly required. The harder requirement is strong software-engineering with deep Python expertise. Engineers from backgrounds in compilers, language tooling, code execution, or developer infrastructure are competitive without RL specifics. The first 3-6 months involve significant RL ramp-up; candidates who can demonstrate fast learning and quantitative rigor compensate for missing RL background.

What is Anthropic's compensation philosophy for senior research engineers?

Compensation at Anthropic for senior research engineers is among the highest in the industry — $500,000-$850,000 base reflects this. Total compensation including equity often exceeds $1M for highly experienced candidates. The philosophy is to compete with top labs (OpenAI, DeepMind) and frontier hedge funds for the same talent. Compensation reflects scarcity of senior RL + LLM engineering talent globally.

This listing aggregates publicly posted role information and adds market context. AIJobSearch.us operates in commercial relationship with our partner platform.
Scroll to Top
Apply Now →
$180,000/yr
Apply Now →
Tech AI Research Reinforcement Learning

About Anthropic

Anthropic
Operating in:
Tech AI Research Reinforcement Learning
Location San Francisco, CA
Listing on AIJobSearch.us · partner platform