Site Reliability Engineer

Imagine you have an exclusive backstage pass to one of world’s largest cloud platforms used by teams and companies around the world. What kind of skills, relationships and insights would you develop if your daily job provided you with insider access to the teams that engineer and run Azure and Microsoft’s developer services?

As a Senior Service Engineer in the Visual Studio Team Services (VSTS) group you will:
• Build solutions that boost the reliability, performance and security of Microsoft’s developer services and automate and simplify how we work.
• Collaborate with other engineers to design and deliver solutions for disaster recovery, capacity management, monitoring, telemetry, and platform automation.
• Perform deep investigations that stretch your skills as you traverse rich telemetry streams to isolate and solve complex performance and reliability issues for online services.
• We collaborate very closely with Azure to design, operate and optimize large-scale, online services used by teams and businesses across the globe.
• Within our developer services we continually innovate and push technology to the limit with both our scale and design.

Our Service Engineers are focused on our customers and the service design that enables them to trust us. As we drive the maturity of our service we regularly influence and/or contribute improvements in both our services and the Azure platform.

What we are looking for:
• 3 years of experience with C#, PowerShell, ASP.NET/MVC, JavaScript, TypeScript, React, or T-SQL
• BA/BS in Computer Science, Computer Engineering or related technical discipline, or in place of 4 year degree an equivalent industry internship or industry software engineering experience
• Minimum of 3 years of Software development and automation experience
• Troubleshooting skills across network, application, caching, queuing, load-balancing, storage and distributed services layers
• Ability to conceptualize a distributed service, it’s dependencies and the transactional flow when troubleshooting
• Practical experience running large scale online systems built on Azure or similar cloud providers
• At least 3 years of experience designing and implementing solutions for platform and application layer telemetry and monitoring
• Experience coordinating resources across diverse teams to restore service and maintain SLA’s, ITIL certification is preferred.
• Communication skills are a key component of this role with audiences that include customers, peers and at times executive leadership

At Microsoft’s Developer Division (DevDiv) we envision, create, and run a broad array of online services used by developers and teams around the world. Our services run at a massive scale and continue to grow. They are built over both public and private clouds and have been architected with common capabilities and patterns that help us to operate consistently and efficiently.

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:
Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to

Share this Job

Other Locations For This Job