Operations Engineer

  • Company: Workday
  • Location: Boulder, Colorado
  • Posted: November 15, 2017
  • Reference ID: JR-22484
Join our team and experience Workday!

It's fun to work in a company where people truly believe in what they're doing. At Workday, we're committed to bringing passion and customer focus to the business of enterprise applications. We work hard, and we're serious about what we do. But we like to have a good time, too. In fact, we run our company with that principle in mind every day: One of our core values is fun.

Job Description
The PT Operations Engineering team is leading a number of initiatives as a continuation of our goal to provide highly automated delivery systems for Workday developers:
Continue to scale our growing infrastructure from supporting a single service and tens of customers to multiple services and thousands of Workday customers.
Evolve our deployment architecture from a low density topology - single VM per service - toward a high density container native solution like Kubernetes or Docker Swarm.
Support delivery infrastructure and environments for 2 new services coming in 2018.
Enable delivery of PT services into AWS environments by using container native technology. This will involve a lot of external team coordination and dependencies.
Build out an infrastructure communication service to transmit information from our various environments & allow centralized, automated actions to be driven from simple APIs & chat. This will ideally be the starting of actual ChatOps driven management of our datacenters (pending review / approval).
Enhance our monitoring and telemetry for developers to provide a DSL driven management infrastructure for dashboards as well as growing the types of information we collect to provide greater context for developers.
Continue to promote scalability improvements such as multi-tenancy, zero-downtime and resilience testing.

Today the PT OpsEng team has 5 members working on delivery into customer environments. The list above represents a high level view of that work for the next 12 months or so. We are looking to add a 6th member to this group who can learn about our infrastructure, help us grow and evolve, and contribute their own experience and creativity to benefit our customers as we grow.
An Operations Engineering hire in this role would have the following responsibilities:
  • Support customer environment projects with existing team members. This includes participating in on-call, improving our tools, supporting our development teams, and working with external teams.
  • Build relationships with external teams at Workday. This includes Infrastructure, Environment Operations, Development Teams, Testing teams, Program Management, Product Management, and a variety of others. We rely heavily on our relationships with other teams - you'll learn a lot about how Workday operates as a whole.
  • Identify ways to improve our tooling and process to minimize unplanned work & maximize team productivity. We are heavily focused on automation tooling and minimal manual process.
  • Contribute to reviewing changes made by other team members, providing support and mentoring to less experienced team members, and educating others across the organization about how our software operates in production environments.
  • Help our team maintain a very high quality bar in how we operate services through transparency, strong communication, and high quality software.

Joining this team you can expect to become proficient in the following areas:
  • We operate at a large scale. We presently operate thousands of VM nodes across 40+ environments & re-deploy most of those nodes weekly.
  • We use Docker containers for most of our software delivery. This means you will become proficient using and supporting Docker in production.
  • Our configuration management is a combination of Ansible & internally developed python and Go code - you will get experience working with and augmenting these tools for running infrastructure.
  • We work closely with development teams to allow developers to debug customer problems in production - this means building self service infrastructure that is intuitive and functional for developers.
  • We use Kanban & Jira for tracking our work and we rely on rapid iteration to evolve our tooling. We plan weekly, but quite a bit of decision-making happens ad-hoc as we make progress & discover new information.

So, what abilities do we expect you to have?
  • You should be curious, inventive, and persistent. Many of the problems we deal with are not well defined and not previously understood - you'll be constantly learning.
  • You should be comfortable on the Linux command line and have some familiarity using command line tools to diagnose and resolve system problems.
  • You should have some basic understanding of the following: IP Networking, software deployment & monitoring, firewalls, DNS, and storage.
  • You should have a desire to grow and learn about highly automated software delivery in a large scale SaaS environment.

Share this Job