ML Research Engineer Jobs $500K+

Date Posted

TodayNew!

Remote Work Level

Hybrid Remote

Location

San Francisco, CA

Salary

$500,000 - $850,000

Job Schedule

Full-time

Benefits

Substantial equity (Anthropic stock)Comprehensive health, dental, vision401(k) with employer matchGenerous PTOLearning and development support

Requirements & tools

Education

BS, MS, or PhD in Computer Science, ML, or equivalent

Tools & systems

Python (async, concurrent)PyTorch (RL training)RL frameworks (custom + open source)Cloud infrastructure (AWS, GCP)Evaluation and verifier systems

Full-time
San Francisco, CA
Posted 22 hours ago

Anthropic

The Research Engineer, Code Reinforcement Learning at Anthropic advances Claude’s ability to write, edit, test, debug, and ship real software end-to-end. Responsibilities include designing RL environments and coding tasks; building the reward signals and verifiers that capture what good code means; running training experiments on frontier models; diagnosing why a model does (or doesn’t) improve at specific classes of software-engineering work; and improving the speed and reliability of pipelines that enable rapid experimentation. Code RL spans several focus areas — from agentic coding behaviors and code correctness, to long-horizon autonomous engineering, to high-performance code for accelerators — with team matching candidates to highest-impact areas. Required: strong software-engineering skills with deep Python expertise (including async/concurrent programming); comfort owning systems end-to-end and debugging across the stack; balance between research exploration and engineering implementation; rigor in shaping experimental design and interpreting results; care for code quality, testing, and performance; passion for safe and beneficial AI systems. Strong candidates also have experience with RL, RLHF, post-training, or LLM fine-tuning; or have built coding agents, code-execution sandboxes, or similar systems. Compensation: $500,000-$850,000 base plus substantial equity (Anthropic stock).

Role context

Research Engineers on the Code Reinforcement Learning team at frontier AI labs advance models’ ability to write, edit, test, debug, and ship real software end-to-end on real codebases with real tools. At Anthropic, this role within the RL organization blends research and engineering — designing RL environments and coding tasks, building reward signals and verifiers that capture what “good code” means, running training experiments on frontier models, and diagnosing why a model does or doesn’t improve at a class of software-engineering work. The team contributes to every Claude model release with significant impacts on coding capabilities. Compensation reflects frontier-AI-research pay banding at $500,000-$850,000 plus equity.

Quick facts

State employment

35,000

Min experience

7 years

Hiring cycle

45 days

Top skills

RL environment design for code tasksReward signal and verifier developmentFrontier model training experimentationAsync/concurrent Python programmingCross-stack debugging (data pipelines to model training)

Apply Now →

Submit your application in under 2 minutes

Frequently Asked Questions

What does "Code RL" focus on at Anthropic specifically?

Code Reinforcement Learning at Anthropic spans several focus areas: agentic coding behaviors (models that plan multi-step software changes), code correctness (verifying the code actually works), long-horizon autonomous engineering (models that work on weeks-long projects), and high-performance code for accelerators (CUDA, Triton, Pallas optimization). The team matches engineers to whichever subarea has highest leverage at hiring time.

Is prior RL experience required for this Code RL role?

Listed as a strong bonus, not strictly required. The harder requirement is strong software-engineering with deep Python expertise. Engineers from backgrounds in compilers, language tooling, code execution, or developer infrastructure are competitive without RL specifics. The first 3-6 months involve significant RL ramp-up; candidates who can demonstrate fast learning and quantitative rigor compensate for missing RL background.

What is Anthropic's compensation philosophy for senior research engineers?

Compensation at Anthropic for senior research engineers is among the highest in the industry — $500,000-$850,000 base reflects this. Total compensation including equity often exceeds $1M for highly experienced candidates. The philosophy is to compete with top labs (OpenAI, DeepMind) and frontier hedge funds for the same talent. Compensation reflects scarcity of senior RL + LLM engineering talent globally.

Similar positions

Research Engineer — Safeguards Labs

San Francisco, CA · $350,000/yr

Research Engineer — Knowledge Foundations

San Francisco, CA · $350,000/yr

Data Scientist — Developer Productivity

San Francisco, CA · $275,000/yr

This listing aggregates publicly posted role information and adds market context. AIJobSearch.us operates in commercial relationship with our partner platform.

Research Engineer — Code Reinforcement Learning