AI Safety Research Engineer Jobs $350K

Date Posted

TodayNew!

Remote Work Level

Hybrid Remote

Location

San Francisco, CA

Salary

$350,000

Job Schedule

Full-time

Benefits

Substantial equity (Anthropic stock)Comprehensive health, dental, vision401(k) with employer matchGenerous PTOLearning and development support

Requirements & tools

Education

BS, MS, or PhD in Computer Science, ML, Security, or equivalent

Tools & systems

Python (analysis, classifiers)PyTorch (ML modeling)SQL or notebook environmentsAnthropic internal safeguards platformStatistical analysis libraries

Full-time
San Francisco, CA
Posted 20 hours ago

Anthropic

The Research Engineer, Safeguards Labs at Anthropic leads and contributes to research projects investigating new methods for detecting misuse of Claude, identifying malicious organizations and accounts, strengthening model safeguards, and addressing other safety needs. Day-to-day responsibilities include designing and running offline analyses over model usage data to surface abuse patterns; building classifiers and detection systems; evaluating effectiveness; developing and iterating on prototypes that may eventually feed signals into real-time safeguards (partnering with engineers on tech transfer); contributing to broader research portfolio investigating methods for detecting abusive behavior in chat-based or agentic workflows; building evaluations and methodologies for measuring whether safeguards actually work (including in agentic settings); and writing up findings clearly to inform decisions across Trust & Safety, research, and product teams. Required: track record of independently driving research projects from ambiguous problem statements to concrete results (in AI, ML, security, integrity, or related fields); comfort scoping own work and switching between research, engineering, and analysis as projects demand; working familiarity with how large language models operate; Python proficiency. Compensation: $350,000 base plus substantial equity (Anthropic stock). Hybrid SF or NYC.

Role context

Research Engineers on Safeguards Labs at frontier AI labs investigate novel safety methods that protect AI models and the people who use them. At Anthropic, this is a new team operating at the intersection of research and engineering — prototyping new approaches to safe models, usage safeguards, and production safety through offline analysis and subsets of traffic before they graduate into production systems. The team’s work overlaps with account abuse, model behavior safeguards, and other safeguard subteams. Safeguards Labs serves as a research arm taking on ambitious, ambiguous problems and turning them into deployed defenses. Compensation: $350,000 base plus substantial equity. SF or NYC office expected.

Quick facts

State employment

35,000

Min experience

6 years

Hiring cycle

45 days

Top skills

Abuse detection system design for LLMsOffline ML analysis of usage dataClassifier development and evaluationSafety evaluation methodology designCross-functional partnership (Trust & Safety, Research, Product)

Apply Now →

Submit your application in under 2 minutes

Frequently Asked Questions

What does Safeguards Labs do differently from Anthropic's main Safety team?

Safeguards Labs is a research arm — designed to investigate novel ideas through offline analysis and limited traffic subsets before ideas graduate into production safeguards. Main Safeguards teams operate production systems that catch abuse and enforce policies in real-time. Labs is for the ambitious, ambiguous research that may or may not become deployed defenses; Safeguards production teams are for the rigorously-validated systems running 24/7. Labs research often takes 6-18 months from idea to deployed defense.

What background is competitive for this Safeguards Labs role?

Backgrounds in AI/ML research, software engineering, security research, integrity engineering, or trust & safety operations are all competitive. Anthropic specifically values track records of independently driving research projects from ambiguous problem statements to concrete results. PhDs in ML or related fields are common but not required; strong engineering plus demonstrated ability to ship research-quality work substitutes. Working familiarity with LLM behavior (sampling, prompting, training basics) is required.

How does Safeguards Labs balance research independence with mission alignment?

Safeguards Labs is intentionally structured around a 3:1 researcher-to-engineer mix, with each person having substantial latitude over their work. The team values self-directed researchers who can scope their own projects from ambiguous problem statements. At the same time, all work aligns with Anthropic's safety mission — projects are evaluated on potential impact on real-world safety outcomes, not just intellectual interest.

Similar positions

Research Engineer — Code Reinforcement Learning

San Francisco, CA · $500,000 - $850,000/yr

Research Engineer — Knowledge Foundations

San Francisco, CA · $350,000/yr

Data Scientist — Developer Productivity

San Francisco, CA · $275,000/yr

This listing aggregates publicly posted role information and adds market context. AIJobSearch.us operates in commercial relationship with our partner platform.

Research Engineer — Safeguards Labs