Research Engineer — Safeguards Labs

Date Posted
TodayNew!
Remote Work Level
Hybrid Remote
Location
San Francisco, CA
Salary
$350,000
Job Schedule
Full-time
Benefits
Substantial equity (Anthropic stock)Comprehensive health, dental, vision401(k) with employer matchGenerous PTOLearning and development support
Apply Now →View original posting →

Requirements & tools

Education
BS, MS, or PhD in Computer Science, ML, Security, or equivalent
Tools & systems
Python (analysis, classifiers)PyTorch (ML modeling)SQL or notebook environmentsAnthropic internal safeguards platformStatistical analysis libraries

Anthropic

The Research Engineer, Safeguards Labs at Anthropic leads and contributes to research projects investigating new methods for detecting misuse of Claude, identifying malicious organizations and accounts, strengthening model safeguards, and addressing other safety needs. Day-to-day responsibilities include designing and running offline analyses over model usage data to surface abuse patterns; building classifiers and detection systems; evaluating effectiveness; developing and iterating on prototypes that may eventually feed signals into real-time safeguards (partnering with engineers on tech transfer); contributing to broader research portfolio investigating methods for detecting abusive behavior in chat-based or agentic workflows; building evaluations and methodologies for measuring whether safeguards actually work (including in agentic settings); and writing up findings clearly to inform decisions across Trust & Safety, research, and product teams. Required: track record of independently driving research projects from ambiguous problem statements to concrete results (in AI, ML, security, integrity, or related fields); comfort scoping own work and switching between research, engineering, and analysis as projects demand; working familiarity with how large language models operate; Python proficiency. Compensation: $350,000 base plus substantial equity (Anthropic stock). Hybrid SF or NYC.

Role context

Research Engineers on Safeguards Labs at frontier AI labs investigate novel safety methods that protect AI models and the people who use them. At Anthropic, this is a new team operating at the intersection of research and engineering — prototyping new approaches to safe models, usage safeguards, and production safety through offline analysis and subsets of traffic before they graduate into production systems. The team’s work overlaps with account abuse, model behavior safeguards, and other safeguard subteams. Safeguards Labs serves as a research arm taking on ambitious, ambiguous problems and turning them into deployed defenses. Compensation: $350,000 base plus substantial equity. SF or NYC office expected.

Quick facts

State employment
35,000
Min experience
6 years
Hiring cycle
45 days
Top skills
Abuse detection system design for LLMsOffline ML analysis of usage dataClassifier development and evaluationSafety evaluation methodology designCross-functional partnership (Trust & Safety, Research, Product)
Apply Now →
Submit your application in under 2 minutes

Frequently Asked Questions

What does Safeguards Labs do differently from Anthropic's main Safety team?

Safeguards Labs is a research arm — designed to investigate novel ideas through offline analysis and limited traffic subsets before ideas graduate into production safeguards. Main Safeguards teams operate production systems that catch abuse and enforce policies in real-time. Labs is for the ambitious, ambiguous research that may or may not become deployed defenses; Safeguards production teams are for the rigorously-validated systems running 24/7. Labs research often takes 6-18 months from idea to deployed defense.

What background is competitive for this Safeguards Labs role?

Backgrounds in AI/ML research, software engineering, security research, integrity engineering, or trust & safety operations are all competitive. Anthropic specifically values track records of independently driving research projects from ambiguous problem statements to concrete results. PhDs in ML or related fields are common but not required; strong engineering plus demonstrated ability to ship research-quality work substitutes. Working familiarity with LLM behavior (sampling, prompting, training basics) is required.

How does Safeguards Labs balance research independence with mission alignment?

Safeguards Labs is intentionally structured around a 3:1 researcher-to-engineer mix, with each person having substantial latitude over their work. The team values self-directed researchers who can scope their own projects from ambiguous problem statements. At the same time, all work aligns with Anthropic's safety mission — projects are evaluated on potential impact on real-world safety outcomes, not just intellectual interest.

This listing aggregates publicly posted role information and adds market context. AIJobSearch.us operates in commercial relationship with our partner platform.
Scroll to Top
Apply Now →
$180,000/yr
Apply Now →
Tech AI Safety Trust & Safety

About Anthropic

Anthropic
Operating in:
Tech AI Safety Trust & Safety
Location San Francisco, CA
Listing on AIJobSearch.us · partner platform