Sr. Software Engineer--M365 Copilot Inference team
Microsoft
The M365 Copilot Inference team is on a mission to build a high-performance, reliable, and compliant inference foundation that powers Microsoft 365 Copilot—optimizing model serving, GPU efficiency, routing, and global scale across commercial and restricted/sovereign cloud environments.
We build and operate shared inferencing services that enable partner teams to deploy and run models at scale while meeting enterprise expectations for trust, compliance, and operational excellence, across broad M365 environments and large geographic footprints for performance-based routing and collocation patterns.
We are looking for a Senior Software Engineer to help deliver the next generation of inference platform capabilities. You will design, develop, and ship reliable services and components; collaborate closely with partner teams; and contribute to operational excellence for large-scale production systems.
Microsoft’s mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
Responsibilities
- Requirements & design: Partner with stakeholders to determine user requirements for a set of scenarios; lead identification of dependencies and contribute to the development of design documents for a product, application, service, or platform.
- Build & ship: Initiate and build solutions to improve performance, maintainability, and effectiveness; deliver features through to production with quality and reliability.
- Service and system thinking: Contribute to architecture and implementation of services that support large-scale model serving, including considerations for request serving at scale (routing, load balancing, distributed caching/storage, networking, geo/disaster recovery).
- Operational excellence / live site: Hold accountability as a Designated Responsible Individual (DRI) within your scope; participate in on-call and work with partners to monitor system/product/service for degradation, downtime, or interruptions and take action to restore service.
- Reliability, observability, performance: Proactively seek new knowledge and adapt to new trends, technical solutions, and patterns that improve availability, reliability, efficiency, observability, and performance, while driving consistency in monitoring and operations at scale.
- Collaboration & execution: Work closely with engineers, PM, and partner teams to execute project plans, release plans, and work items; communicate clearly across organizational boundaries.
- Grow others: Lead by example and mentor others through code reviews and design reviews, helping improve code quality and engineering practices across the team.
Qualifications
Required Qualifications:
- Bachelor's Degree in Computer Science or related technical field AND 4+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- Experience building applications and/or services in the cloud and delivering production-quality software with strong debugging and problem-solving skills.
- Ability to reason for modern distributed software design patterns and cloud systems architecture (e.g., microservices, containers, load-balancing, queuing, caching) and to build/ship/operate reliable solutions.
- Solid communication and collaboration abilities to align and deliver across organizational boundaries.
- Demonstrated ownership mindset to drive end-to-end for delivering and operating reliable services.
Other Requirements:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include but are not limited to the following specialized security screenings:
- Master's Degree in Computer Science or related technical field AND 6+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR Bachelor's Degree in Computer Science or related technical field AND 8+ years technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Java, JavaScript, or Python
- OR equivalent experience.
- Experience designing distributed systems, near real-time processing solutions, or large data platforms; experience working with large-scale distributed systems (cloud/SaaS) in complex environments.
- Experience improving production systems with an engineering mindset around reliability, performance, and observability at scale.
- Familiarity with inference platform concerns such as model serving, GPU efficiency, and large-scale request routing; experience improving GPU inference efficiency (profiling, scheduling, batching, memory management, runtime optimization) is a plus.
This position will be open for a minimum of 5 days, with applications accepted on an ongoing basis until the position is filled.
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance with religious accommodations and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.