Full Stack AI Systems Engineer
Software Engineering, Data Science
San Diego, CA, USA
Posted on Jun 13, 2026
Are you a senior engineer who can keep large, AI-augmented systems running reliably at Apple scale? Apple's Stability Engineering team is looking for a seasoned engineer to join our Core team in San Diego. We build and operate the platforms, services, and infrastructure that turn crash reports from Apple devices into actionable engineering insights. You'll work on systems where LLMs and agents are already part of the production fabric — evolving them, hardening them, and using AI tools to extend what a small team can deliver.
Our team owns the end-to-end platform behind stability analysis at Apple: symbolication of crash logs across the company's hardware portfolio, the data pipelines that aggregate and cluster crash logs, and the applications and services that engineers across Apple use every day to drive operating-system quality. This role is about keeping that platform healthy, extending it deliberately, and making the engineering team itself more effective by using AI tools well. Day to day, you'll spend most of your time on the engineering work of running real systems: tuning evaluation infrastructure, tightening operational controls, improving auditability and debug trails, and scaling the workflows our analysts rely on. When new capabilities are needed, you'll prototype and integrate them into the platform. You'll partner closely with stability analysts who are domain experts in OS reliability, and with the broader team responsible for symbolication, ETL, and service infrastructure. You'll also be expected to use AI-assisted development tools fluently to investigate issues, refactor at scale, and ship more with a small team. We're looking for someone with the rigor of a seasoned production engineer who is also comfortable operating systems that include LLMs and agents as first-class components. If you enjoy taking responsibility for a complex, already-running platform and making it steadily better, we want to talk.
- 5+ years of professional software engineering experience building and operating production systems
- BS in Computer Science or a related field, or equivalent practical experience
- Fluent use of AI-assisted development tools (coding agents, code review assistants, etc.) to work effectively at scale
- Demonstrated experience designing and scaling distributed systems (load balancing, active-active topologies, capacity planning, throughput-bound services)
- Track record of maintaining and evolving production services — observability, operational controls, incident response, and steady iteration on existing systems
- Strong full-stack instincts; comfortable spanning data infrastructure, backend services, and the user-facing surfaces that consume them
- Proven ability to operate independently on ambiguous, open-ended problems where the right answer is not obvious
- Experience operating LLM- or agent-based features in production environments over time
- Experience building or maintaining evaluation harnesses, audit trails, or
- replay infrastructure for AI systems
- Background in developer tools, observability, crash/stability analysis, or other operating-system-quality domains
- Familiarity with one or more of: Ruby on Rails, Node.js/TypeScript, Python for production services
- Experience working in environments with significant deferred scalability work (capacity-constrained, long-lead-time infrastructure)