Manager, Site Reliablity Engineer

  • Company: Capital One
  • Location: New York, New York
  • Posted: January 12, 2017
  • Reference ID: R17314
114 5th Ave (22114), United States of America, New York, New York

Manager, Site Reliablity Engineer

Do you think about writing beautiful code while eating breakfast? Do you build applications you are proud of and want to tell your friends about? Do you get excited scaling web and mobile applications to millions of users? Do you appreciate elegant solutions but also know when to build simple first and iterate? 

At Capital One, we are transforming everything we do as a digital bank. We are aggressively changing our business model from that of a traditional bank to a mobile first business model. As we work to integrate the country's largest digital bank, we also want to change the way financial services occurs in the digital space. Most industries that sell a fundamentally information product have been deeply disrupted by digital technology. Banking is ripe for this disruption. However, due to the regulatory environment, this disruption is unlikely to come from outside the industry and more likely from within.

As a Manager, Site Reliability Engineer on the Data Intelligence team, you will contribute to building a fast data and machine learning platform scalable to solve diverse business problems.  We envision, create, deploy, and maintain full stack technology solutions powered by streaming big data, state of the art machine learning, micro-service architecture, and intuitive visualizations in the cloud.

At Capital One, we have seas of big data and rivers of fast data. To manage this, we are working with a number of cutting edge machine learning technologies, and are actively developing more. We are highly technical with strong backgrounds in what we do.  Our use cases range from cyber threat prevention to predicting environment outages to enable our always on 24/7 services.  We have the highest executive support and direct impact on our customer experience and bottom line.


  • Implement changes to applications and infrastructure.  Configure and update appropriate monitors and alerts.  Ensure systems meet Capital One standards for security and resiliency.
  • Code Ansible Playbooks in an Amazon Web Services (AWS) Public Cloud environment
  • Code frameworks/APIs on AWS using Java/python/Ruby/PHP SDKs
  • Programing data ingestion/processing in any of the scripting languages
  • Deliver AWS based infrastructure solutions using AWS Cloud Formation (JSON) for configuration management
  • Migrate on premise applications to AWS
  • Create models/diagrams/views to facilitate infrastructure as a service (IaaS), Software as a service(SaaS) and Platform as a Service(PaaS) solutions including JSON file creation
  • Develop procedures to automate various systems and tasks (e.g. automating code builds and deployments) including monitors and alerts as well as automated error detection using Splunk and Zabbix
  • Execute system administration of hosting platforms capable of running on a variety of frameworks (java, node.js, ruby, php, python)
  • Assist in code promotion process to production environments
  • Work with production SQL and No SQL databases to optimize performance and resiliency, including disaster recovery.



Basic Qualifications:

Bachelor's degree or military experience

At least 3 years of experience providing enterprise Linux based system administration

At least 1 year of experience with GIT or at least 2 years of experience with SVN or at least 1 year of experience working with Jenkins

At least 1 year of experience working with Python

At least 2 year of experience working with AWS cloud automation

At least 1 year of experience with Ansible or 2 years of experience in Chef or Puppet

Preferred Qualifications: 

1+ year of experience in an enterprise cloud environment using AWS

1+ year of experience working with Ansible

2+ year of experience working with Linux

At this time, Capital One will not sponsor a new applicant for employment authorization for this position.

Share this Job