If you are passionate about large scale, mission critical software systems, and you have a maniacal focus on system availability and performance, Reliability Engineering is for you.
As a member of our Reliability Engineering team, you will be responsible for scaling some of the largest software products in Retail by automating the application infrastructure, deployment, and monitoring of those products in production. You will also be part of a 24x7 on-call team that will lead the triage of incidents for your products using your expertise to mitigate the problem as soon as possible. Our "own what you build" mentality empowers you to make decisions quickly to deliver reliability improvements without the red tape that typically surrounds enterprise environments. Our Reliability Engineering motto is: Enable Speed with High Availability.
You should have a passion for automating as much as possible and constantly be on the lookout for areas where operational and code efficiencies can be improved. You will work directly with product engineering teams leveraging XP principles, and, when you aren't automating all the things, you will be proactively executing destructive tests, participating in "game day" exercises, and related activities to improve the operational readiness of your product(s).
MAJOR TASKS, RESPONSIBILITES AND KEY ACCOUNTABILITIES
70% - Delivery & Execution:
Collaborates and pairs with other product team members (UX, engineering, and product management) to create secure, reliable, scalable software solutions
Works with Product Team to ensure user stories that are developer-ready, easy to understand, and testable
Writes custom code or scripts to automate infrastructure, monitoring services, and test cases
Writes custom code or scripts to do "destructive testing" to ensure adequate resiliency in production
Configures commercial off the shelf solutions to align with evolving business needs
Creates meaningful dashboards, logging, alerting, and responses to ensure that issues are captured and addressed proactively
20% - Support & Enablement:
Fields questions from other product teams or support teams
Monitors tools and participates in conversations to encourage collaboration across product teams
Provides application support for software running in production
Proactively monitors production Service Level Objectives for products
Proactively reviews the Performance and Capacity of all aspects of production: code, infrastructure, data, and message processing
10% - Learning:
Participates in learning activities around modern software design and development core practices (communities of practice)
Proactively views articles, tutorials, and videos to learn about new technologies and best practices being used within other technology organizations
NATURE AND SCOPE
Typically reports to the Software Engineer Manager or Sr. Manager.omfortably with diverse groups of people
We recognize that military members are adept, motivated and hardworking. That’s why we made a commitment in 2012 to hire 55,000 veterans in 5 years!