Site Reliability Engineer

The SharePoint Online (SPO) Service Reliability Engineering (SRE) team delivers SharePoint services to customers globally. Most customers are Fortune 500 companies and nearly all are recognizable household names. Today we are in the top hosted service providers in the world with millions of seats sold and active on our platform. The SPO team is adding to its team of experts focused on site reliability and user experience with our service. In this role you will see the hardest, most interesting problems in the SharePoint world and will have the mission to drive better experience with our service by eradicating problems encountered in run-state. The person who will succeed in this role will be creative and unrestrained by the current limitations of software and systems. They will be able to envision solutions to problems and take those solutions from concept through to implementation and champion the solutions for the future. The ideal candidate must stay focused on our customers and understand how the solutions we are engineering and running to support our customer’s objectives behind choosing Office 365 as their online productivity and collaboration service.

The SPO SRE team has an immediate opening for a Site Reliability Engineer with technical chops and with passion for problem-solving, simplifying processes and improving service metrics. We need a driven Site Reliability Engineer who can actively participate in the day-to-day combat by maintaining high reliability of our service and drive prioritization in fixing what may be broken today as well as able to envision, design and implement processes and technologies to improve the ability to identify, isolate, correlate, and mitigate service impacting problems in the system. Service restoration, and making customers happy is not enough, you must know some coding to automate routine tasks in service metrics gathering, correlating, organizing, and presenting, in addition to detail and in-depth root cause analysis.

Day-to-day responsibilities include:

• Service restoration
• Manage customer escalations
• Conduct root cause analysis
• Prescribe remediation
• Manage escalations and collaborations with Product Group via the SRE Leadership Team
• Finds new sources and methods to generate service insights in drive standardization in problem definition and prioritization in driving solution to top run-state issues
• Broad understanding of large scale system architecture, automation, integration, and processes
• Simplifying problems into solutions, and drive initiative with clear prioritization.

Basic Qualifications:

1+ years of programming, software engineering, service engineering, site reliability, automation, software development, and/or software support experience

Skills, Qualifications and Experience:

• Extensive experience in enterprise-level technical depth in all technologies supporting SharePoint products, including SharePoint 2010, SharePoint 2013, SharePoint 2016, Office 365, SQL 2008/2012, IIS, WS2008/2012
• We are a team of experts in key areas necessary to delivering hosted SharePoint in the cloud. To this end, the base expectation is a functional knowledge of SharePoint and associated technologies including: Architecture, operations, debugging SQL, IIS, PowerShell
• Expertise in SharePoint administration principals
• Large scale web farm operations expertise including tuning for high traffic publishing portals
• Knowledge of SharePoint development techniques including FTC and CAM, SQL server 2008 administration, SQL High-Availability and disaster recovery design, operational strategy in SQL2008 SQL performance analysis and troubleshooting skills
• A degree in Computer Science, MS technology related certification, related field or equivalent experience is required

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
Citizenship Verification: This position requires verification of US Citizenship to meet federal government security requirements.
Fingerprint Background Check: This position will be required to pass a customer required Fingerprint Background Check.
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.
Public Trust Position: This position may require passing a United States Public Trust Position (PTP) background investigation to meet federal, state and/or local government requirements.


Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to

Share this Job

Other Locations For This Job