Demo Capital
companies
Jobs

​​Site Reliability Engineer - II​

Microsoft

Microsoft

Software Engineering
Posted on Oct 17, 2025

​​Site Reliability Engineer - II​

Bangalore, Karnataka, India

Save

Share job

Date posted
Oct 17, 2025
Job number
1895093
Work site
3 days / week in-office
Travel
0-25 %
Role type
Individual Contributor
Profession
Software Engineering
Discipline
Site Reliability Engineering
Employment type
Full-Time

Overview

Microsoft is a company where passionate innovators come to collaborate, envision what can be and take their careers further. This is a world of more possibilities, more innovation, more openness, and the sky is the limit thinking in a cloud-enabled world.

Microsoft’s Azure Data engineering team is leading the transformation of analytics in the world of data with products like databases, data integration, big data analytics, messaging & real-time analytics, and business intelligence. The products our portfolio include Microsoft Fabric, Azure SQL DB, Azure Cosmos DB, Azure PostgreSQL, Azure Data Factory, Azure Synapse Analytics, Azure Service Bus, Azure Event Grid, and Power BI. Our mission is to build the data platform for the age of AI, powering a new class of data-first applications and driving a data culture.

Within Azure Data, the databases team builds and maintains Microsoft's operational Database systems. We store and manage data in a structured way to enable multitude of applications across various industries. We are on a journey to enable developer friendly, mission-critical, AI enabled operational Databases across relational, non-relational and OSS offerings.

We are hiring a Software Engineer 2 to join the Azure Cosmos DB team, where you will be working on a large-scale distributed operational database. In this role, you will work on distributed systems problems and technologies to help determine the future of our planet scale database. ​​

We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served.

Qualifications

Required/Minimum Qualifications

  • Hands-On Experience: Demonstrate 3-8 years of practical experience in site reliability engineering within commercial large-scale software Organizations.
  • Proficiency in coding languages (such as Python, .NET).
  • Live Site Troubleshooting: Adept at troubleshooting live site issues and providing guidance to engineering teams to resolve them promptly.
  • Cloud Proficiency: Possess a good understanding of public cloud offerings such as Azure, Google Cloud, or AWS.
  • Distributed Systems: Experience with distributed systems and micro-service-based architectures.
  • Performance Analysis: Conduct in-depth analysis of web application performance, identifying bottlenecks and areas for improvement. Utilize various monitoring tools and performance profiling techniques to diagnose and troubleshoot performance issues.

Other Requirements

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check:

  • This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred/Additional Qualifications

Other Requirements

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings: Microsoft Cloud Background Check:

This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

​​​​

#azdat

#azuredata

​​#cosmosdb​

Responsibilities

  • Operational Efficiency: Lead designing systems/solutions at org scale, streamlining processes and enhancing efficiency.
  • AIOps: Use AI tools and agents to improve SLO/SLAs and reduce toil.
  • Monitoring/Observability Architecture: Develop and implement monitoring agents, dashboards, escalations, and alerts to proactively manage and improve service reliability.
  • Incident Management: Participate in a distributed on-call rotation, drive root cause analysis during outages, and write and review postmortems to continuously improve our services and practices.
  • Team Growth: Advocate for SRE best practices, work independently, and help grow the SRE team by onboarding and mentoring new teammates.

Embody our culture and values


Benefits/perks listed below may vary depending on the nature of your employment with Microsoft and the country where you work.
Industry leading healthcare
Educational resources
Discounts on products and services
Savings and investments
Maternity and paternity leave
Generous time away
Giving programs
Opportunities to network and connect

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.