Senior Service Engineer

Imagine you have an exclusive backstage pass to one of world’s largest cloud platforms used by teams and companies around the world. What kind of skills, relationships and insights would you develop if your daily job provided you with insider access to the teams that engineer and run Azure and Microsoft’s developer services?

As a Site Reliability Engineer (SRE) in the Visual Studio Team Services (VSTS) group you will:

•Build solutions that boost the reliability, performance and security of Microsoft’s developer services and automate and simplify how we work.

•Collaborate with other engineers to design and deliver solutions for disaster recovery, capacity management, monitoring, telemetry, and platform automation.

•Perform deep investigations that stretch your skills as you traverse rich telemetry streams to isolate and solve complex performance and reliability issues for online services.
We collaborate very closely with Azure to design, operate and optimize large-scale, online services used by teams and businesses across the globe. Within our developer services we continually innovate and push technology to the limit with both our scale and design.

Luck favors the prepared and by training and experimenting in our failure mode environment you’ll be ready to provide the DevOps leadership needed to mitigate outages quickly. Our Site Reliability Engineers are focused on our customers and the service design that enables them to trust us. As we drive the maturity of our service we regularly influence and/or contribute improvements in both our services and the Azure platform.

What we are looking for:

•You feel comfortable coordinating resources across diverse teams to restore service and maintain SLA’s

•Troubleshooting skills across network, application, caching, queuing, load-balancing, storage and distributed services layers

•Ability to conceptualize a distributed service, it’s dependencies and the transactional flow when troubleshooting

•Practical experience running online systems built on Azure or similar cloud providers

•Experienced designing and implementing solutions for platform and application layer telemetry and monitoring

•Communication skills are a key component of this role with audiences that include customers, peers and at times executive leadership


•BA/BS in Computer Science, Computer Engineering or technical discipline or 4 years of industry internship or industry software engineering experience

•Software development and automation with one or more languages such as C#, PowerShell, ASP.NET/MVC, JavaScript, TypeScript, React, and T-SQL (has 3 to 5 years of experience) is preferred

At Microsoft’s Developer Division (DevDiv) we envision, create, and run a broad array of online services used by developers and teams around the world (view our story here). Our services run at a massive scale and continue to grow. They are built over both public and private clouds and have been architected with common capabilities and patterns that help us to operate consistently and efficiently.

Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud
background check upon hire/transfer and every two years thereafter.

Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application or the recruiting process, please send a request to