START AND END DATES FOR THIS INTERNSHIP ARE SPRING 2018 (6 Months), SUMMER 2018 (3 Months) and FALL 2018 (6 Months)
At IBM we have an amazing opportunity to transform the world with cognitive technology. By using the vast amounts of information available today to identify new patterns and make new discoveries, we are helping cities become smarter, hospitals transform patient care, financial institutions minimize risk, and pharmaceuticals find cures for rare diseases. Join the forward-thinking teams at IBM solving some of the world's most complex problems -there is no better place to launch your career!
Site Reliability Engineer Interns work closely with Development to keep cloud deployed services operating and performing at levels both promised and expected by our customer base.Site Reliability Engineer Interns are in demand across IBM's growth areas. You'll be matched and deployed to a development team in a strategic business, based on your offered location and fit. These are office-based positions in IBM locations including:
AZ - Phoenix
CA - Almaden, Costa Mesa, Emeryville, Foster City, Redwood City, San Francisco, San Jose
CO - Denver
GA - Atlanta
MA - Andover, Cambridge, Littleton
MN - Rochester
NC - Raleigh-Durham
NY - New York City, North Castle, Poughkeepsie, Yorktown Heights
OH - Cleveland, Dublin, Hartland
OR - Hillsboro
PA - Blue Bell, Pittsburgh
TX - Austin, Dallas
VT - Essex Junction
Opportunities in these locations will vary based on business demand.
What You'll Do:
- You'll work in an Agile, collaborative environment to deploy, monitor, and maintain systems, which will include software installations, updates, and core services.
- You'll automate repetitive and error prone tasks and processes, using tools like Ansible, Jenkins, Maven, Ant, Gradle, Chef, Puppet, Docker, UrbanCode, anda variety of scripting languages.
- You'll ensure adequate monitoring is in place and enhance or adjust where needed, using tools like ElasticSearch, Prometheus, Marmot, NewRelic, and the IBM Cloud Monitoring Service.
- You'll continuously measure the availability, latency and overall system health, using tools like Kibana, Grafana, Zabbix, and others.
- You'll help with capacity planning to ensure continuous performance of the cloud systems.
- You'll respond to incidents and drive change that prevents the same issue from re-occurring. You will also look for opportunities to automate the recovery for certain incidents that may be difficult to prevent.
- You'll design and implement tools for automated deployment and monitoring of multiple environments.
- You'll troubleshoot and resolve incidents.
Who You Are:
- You are highly motivated and have a passion for ensuring scalable and highly-available products.
- You have very strong verbal and written communication skills.
- You are great at solving problems, debugging, and designing and implementing solutions to complex technical problems.
- You are familiar with operating systems such as Linux, Windows, iOS and Android.
- You have a basic understanding of programming/scripting in a language such as Java, Bash, Python, or Ruby.
A little about us:
IBM is the world’s largest information technology company with more than 360,000 employees serving clients in 170 countries.