Site Reliability Engineer

  • Company: Capital One
  • Location: Wilmington, Delaware
  • Posted: January 07, 2017
  • Reference ID: R16893
Benjamin Franklin (18052), United States of America, Wilmington, Delaware

Site Reliability Engineer

A successful Site Reliability Engineer (SRE) is responsible for the management and automation of production environments and applications including availability, latency, performance, efficiency, change management, monitoring, emergency response, and capacity planning. The SRE must also support, deploy, and provide enhancements to the application infrastructure and act as a successful partner within the DevOps community. The requirement to provide solutions used to support, architect, develop and promote the organizational functions of Capital One is paramount. Primary responsibilities include interactions with delivery and development teams and consultation on new infrastructure and application projects which are architected and instantiated in production. This will include infrastructure support and interaction with server, database, networking, cloud engineering, telecomm support and development teams to continue to grow the abilities and needs of Capital One. 

This position is responsible for supporting, managing and implementing applications used to support various business units within Capital One. This position will function as the primary automation, application and infrastructure support for various applications. A successful candidate in this role will ensure that established customer service and system availability targets are met. This position will require fundamental understanding of several technologies involved in supporting the lifecycle of the applications in all facets. Automation is a key tenant to ensure that all manual processes are automated and all issues addressed and alerted on are actionable.

Continued responsibility to provide upper-level development support and interfacing with developers of various systems and applications. Responsible for deployment of applications or the automation of the deployment as well as the implementation of efficient cost effective application solutions. The SRE will analyze, design, implement, monitor, automate and document application solutions to help provide outstanding service to the businesses supported.


  • Provide primary ownership of application stability, automation and reliability
  • Monitor and maintain applications per agreed upon Service Level Objectives 
  • Support and maintain configuration management for various applications and systems
  • Research and evaluate alternative solutions to complex issues in the most cost effective and time effective methods
  • Identify and resolve a broad range of problems that occur in production applications and systems
  • Serve as part of the architecture and development lifecycle implementing systems
  • Support the recovery and resiliency strategy and architecture for various applications and systems
  • Proactively support capacity planning and disaster recovery and resiliency aspects 
  • Govern support processes, resiliency and automation principles for the larger organization
  • Understand supported businesses and effectively hold two-way dialogues and recommendations 
  • Develop system administration standards and procedures to maintain consistent practices
  • Provide direction and guidance to other infrastructure and DevOps engineers
  • Work with business teams to identify complex requirements and their integration into existing and new technologies


This role will provide exposure in the following

  • Various programming and scripting languages. Such as Python, Powershell, Shell Scripting, Batch, Perl, Java, C++, etc.
  • Various automation languages/platforms. Such as: Chef, Puppet, Ansible, Perl, Python, UDeploy, AHP, etc.
  • Multiple application platforms; Such as, Tomcat, Websphere, Weblogic, JBOSS, Apache, NGINX, PHP, IIS, .NET, etc.
  • Virtualization and cloud technologies and infrastructure
  • System architecture and design
  • Various database and big data platforms; Such as MS SQL, Oracle, Hadoop, MySQL, etc.
  • Networking infrastructures and technologies. F5/Load Balancers, SDWAN, Routers/Switches, etc.
  • Various monitoring technologies and methods. Such as, Splunk, NewRelic, Atternity, Tealeaf, RiverBed(netflow), etc.
  • Various system platforms; such as UNIX, AIX, Linux, Windows, Docker, etc.
  • Security principles and architecture; Such as, CA, SSL, encryption, Identity Management, SOX, etc.

Basic Qualifications:

- High school diploma, GED, equivalent certification or military experience.

- At least 2 years of experience in Application Technical Support.

- At least 2 years of experience in Network Support.

- At least 2 years of experience in database and big data platforms

- At least 1 year of experience in Software Development.

Preferred Qualifications:

-ITIL Foundation Certification.

-1+ years of experience with Application such as Tomcat, Websphere, Weblogic, JBOSS, Apache, NGINX, PHP, IIS, or .NET,

-3+ years of experience in Network Support using technologies such as F5/Load Balancers, SDWAN, Routers/Switches

-1+ years of experience in Reporting and Analysis.

-1+ years of experience working in a Windows environment.

-Banking industry experience is a plus.

At this time, Capital One will not sponsor a new applicant for employment authorization for this position.

Share this Job