Staff Site Reliability Engineer, Managed AI
Company: Crusoe
Location: Sunnyvale
Posted on: February 14, 2026
|
|
|
Job Description:
Job Description Job Description Crusoe's mission is to
accelerate the abundance of energy and intelligence. We’re crafting
the engine that powers a world where people can create ambitiously
with AI — without sacrificing scale, speed, or sustainability. Be a
part of the AI revolution with sustainable technology at Crusoe.
Here, you'll drive meaningful innovation, make a tangible impact,
and join a team that’s setting the pace for responsible,
transformative cloud infrastructure. About the Role: At Crusoe, our
Site Reliability Engineering team ensures the reliability and
scalability of Crusoe’s AI-optimized cloud platform. We’re looking
for a Staff Site Reliability Engineer with a strong background in
distributed systems and hands-on experience with large language
models to help us build and operate managed AI services at scale.
This role is central to delivering highly available, performant,
and cost-efficient AI infrastructure that powers compute-intensive,
latency-sensitive workloads for our customers. What You’ll Work On:
Design and operate reliable managed AI services with a focus on
serving and scaling LLM workloads Build automation and reliability
tooling to support distributed AI pipelines and inference services
Define, measure, and improve SLIs/SLOs across AI workloads to
ensure performance and reliability targets are met Collaborate with
AI, platform, and infrastructure teams to optimize large-scale
training and inference clusters Automate observability by building
telemetry and performance tuning strategies for latency-sensitive
AI services Investigate and resolve reliability issues in
distributed AI systems using telemetry, logs, and profiling
Contribute to the architecture of next-generation distributed
systems purpose-built for AI-first environments What You’ll Bring:
Strong software engineering background — experience building
production-grade systems beyond scripting or Bash Demonstrated
experience in distributed systems design and implementation
Hands-on work with large language models (LLMs) or AI/ML
infrastructure SRE mindset and experience (whether or not under the
SRE title) including: Defining and measuring SLIs/SLOs Building
monitoring and observability systems Driving performance and
reliability improvements Designing fault-tolerant systems and
automated testing strategies Proficiency in at least one modern
programming language (Python, Go, Java, C++) Familiarity with
Kubernetes or container orchestration platforms Strong
collaboration and communication skills Ability to thrive in a
fast-paced, mission-driven environment Bonus Points: Experience
scaling inference or training workloads for LLMs Benefits: Industry
competitive pay Restricted Stock Units in a fast growing,
well-funded technology company Health insurance package options
that include HDHP and PPO, vision, and dental for you and your
dependents Employer contributions to HSA accounts Paid Parental
Leave Paid life insurance, short-term and long-term disability
Teladoc 401(k) with a 100% match up to 4% of salary Generous paid
time off and holiday schedule Cell phone reimbursement Tuition
reimbursement Subscription to the Calm app MetLife Legal Company
paid commuter benefit; $300 per month Compensation: Compensation
will be paid in the range of $204,000 - $247,000 Bonus. Restricted
Stock Units are included in all offers. Compensation to be
determined by the applicant’s education, experience, knowledge,
skills, and abilities, as well as internal equity and alignment
with market data. Crusoe is an Equal Opportunity Employer.
Employment decisions are made without regard to race, color,
religion, disability, genetic information, pregnancy, citizenship,
marital status, sex/gender, sexual preference/ orientation, gender
identity, age, veteran status, national origin, or any other status
protected by law or regulation.
Keywords: Crusoe, Pleasanton , Staff Site Reliability Engineer, Managed AI, IT / Software / Systems , Sunnyvale, California