We're seeking a solution-oriented machine learning engineer with strong software development skills to architect, build, and maintain innovative evaluation solutions and tools for large-scale statistical assessment of GenAI-powered products, models, and AI agents. As a key member of our team, you'll deliver evaluation-as-a-service solutions that empower product and modeling teams across Apple to run comprehensive statistical evaluations, generate actionable metrics and insights, and make informed shipping decisions.
What You'll Do:
Partner with cross-functional teams to translate evaluation needs into robust technical solutions for conversational AI, language models, and AI agent capabilities
Own end-to-end requirements gathering, proof-of-concept development, and co-drive the development roadmap for ML system evaluation platforms
Design and implement scalable solutions that enable statistical analysis of product experiences, model performance, and AI agent behavior at scale
Drive system integration efforts and influence how evaluation software is incorporated into ML model and AI agent CI/CD pipelines
Develop monitoring and observability solutions to provide deep insights into platform performance, evaluation quality, and AI agent reliability
Build specialized evaluation frameworks for AI agents, including multi-step reasoning assessment, tool usage validation, and agent interaction quality measurement
Iterate rapidly based on stakeholder feedback while maintaining platform reliability and performance across diverse AI workloads
The ideal candidate thrives in fast-paced environments, combines strategic thinking with hands-on problem-solving, and is passionate about enabling data-driven decisions that enhance Apple product experiences for millions of users. You'll be instrumental in building the next generation of evaluation infrastructure that supports Apple's expanding AI agent capabilities.