Job Description
IntroductionAt IBM, work is more than a job – it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so, lets talk.
Your Role and ResponsibilitiesAs a SRE Engineer you will be responsible to - Enable automation for some of the following key functions – CI/CD, Observability, Alerting.
- Must be able to specify service level indicators and objectives
- Perform Incident Management and Disaster Recovery
- Do On-Call Support and Issue Resolution
- Facilitate Post Incident Analysis.
Responsibilities:
As SRE Engineer, you are responsible for ensuring that the underlying infrastructure is running smoothly and that systems and tools are working as expected.
- You will also monitor critical applications and services to minimize downtime and ensure their availability.
- Issue resolution
- You will work closely with developers, especially when issues arise so they will collaborate with developers to help with troubleshooting and provide consultation when alerts are issued.
- You will investigate and then resolve the issue in the event that a developer runs into a problem.
- Following the incident resolution, you will revisit the issue and determine the cause to ensure it doesn’t happen again.
- Cross team collaboration-SREs work across different teams, mainly operations and development
- By building reliable systems and providing support to these teams, this will give these teams more time to divert their attention to building new features and hence get these out faster to customers.
Required Technical and Professional Expertise
Required Skills:
- 6-15 Years of relevant Industry Experience
- Experience of SRE in managing cloud-based production applications
- Proficient in monitoring critical applications and services to minimize downtime and ensure their availability.
- Experience to investigate and then resolve the issue in the event that a developer runs into a problem
Proficient in the following technologies:
- Continuous Integration and Delivery (CI/CD) across SDLC phases (Kubernetes , OpenShift, TekTon, Terraform)
- Secure Dev Practices
- Networking
- Monitoring
- Microservices Architecture
- Cloud Provider Experience (IBM, Azure, AWS etc)
- Alerts, Incident Response, Infrastructure Component Provisioning
Preferred Technical and Professional Expertise
Preferred Skills:
- Experience in cloud-based production applications
- Understanding of Storage Domain, Ceph is desired