System Reliability Engineer

  • Company: Capital One
  • Posted: January 26, 2017
  • Reference ID: R18313
Plano 1 (31061), United States of America, Plano, Texas

Great - and always improving – production support strategies are an essential ingredient of our current and future success.  We are actively seeking talented system support and engineering specialists who canown support of applications and systems in production and can drive reliability and performance across massive scale by mastering the full depth of the stack. As an engineering-focused Production Support Specialist, you will have the opportunity to tackle complex problems of scale which are unique to tech companies while using your expertise in delivery and support of critical services. With a passion for devops and continuous improvement you will be encouraged to raise the bar for other support specialists to help take their capabilities to the next level.


- Incident resolution and supporting production system deployments while ensuring SLAs are met.

- Deliver on Time to Resolve and Time to Detect reduction efforts.

- Identify and contribute to long-term solutions and preventative techniques.

- Increasing Self-Healing through closed loop automation.

- Progress, protect, and provide for the applications and sub-systems behind all of Capital One’s external and internal customer facing services with an ever-watchful eye on their availability, latency, performance, and capacity. 

- Collaborating with other tech leads and support teams to ensure integrated end-to-end availability, reliability, and performance

- Support and deliver within ContinuousIntegration/ContinuousDelivery pipelines

- Influencing resiliency and scalability in production environments in Amazon Web Services
 and other cloud platforms

- Equipping systems with automated monitoring and alerting

- Support the team and contribute to designing, writing and delivering technical and process automations to improve the availability, scalability, latency, and efficiency of Capital One’s services.

- Solve problems relating to mission critical services and build automation to prevent problem recurrence; with the goal of automating response to all non-exceptional service conditions.

- Influence and support new designs, architectures, standards and methods for large-scale distributed systems.

- Engage in service capacity planning and demand forecasting, software performance analysis and system tuning.

- Identifying and remediating risk to critical and non-critical system KPIs

- At least 3 years of Support experience in one or more of the following:

  • Web applications built on open source technologies
  • Mainframe - CICS/MQ/DB2
  • Database Oracle/SQLServer/PostgreSQL/MongoDB
  • Windows Servers with emphasis on CMD Line administration
  • Unix/Linux/AIX server administration
  • Networking: knowledge and understanding of network theory (TCP/IP, UDP, ICMP, DNS, OSI layers, and load balancing)

- At least 1 year of scripting experience in languages such as Bash, Java, Perl, Python, Ruby

- Systematic problem solving approach, coupled with a strong sense of ownership and drive

- At least 1 year of experience with Enterprise Monitoring Tools such as Splunk, BlueStripe, Zabbix, HPOM, Diagnostics, Sitescope, BSM, AppResponse, CA-Unicenter

- Familiarity with hosting apps in Cloud

- Understanding of Unix/Linux systems from kernel to shell and beyond, taking in system libraries, file systems, and client-server protocols along the way

Basic Qualifications:

- Bachelor’s Degree

- At least 1 year of experience in Application Development
- At least 1 year of experience in container technology

- At least 1 year of experience with ITIL practices and principles

- At least 3 years of Support experience in Web applications built on open source technologies one or at least 3 years in Mainframe or at least 3 years in Database or at least 3 years in Windows Servers or at least 3 years in Unix/Linux/AIX server administration or at least 3 years in Networking.

- At least 1 year of experience with Enterprise Monitoring Tools

Preferred Qualifications:

- Master’s Degree

- 5+ years of experience in Application Development
- 5+ years of experience in container technology
- 3+ years of Unix experience
- 5+ years of experience working in a middleware environment
- ITIL certification

- Technical certification(s) in Cloud or OpenSource technologies

Capital One will not consider sponsorship for this position.

Share this Job