Introduction At IBM, work is more than a job – it’s a calling: To build. To design. To code. To consult. To think along with clients and sell. To make markets. To invent. To collaborate. Not just to do something better, but to attempt things you’ve never thought possible. Are you ready to lead in this new era of technology and solve some of the world’s most challenging problems? If so, lets talk
Your Role and Responsibilities We are looking for a highly self-motivated Site Reliability Engineer to join our team and develop software systems and automated solutions for operational aspects in an organization. This includes responsibilities of monitoring large scale computer applications, building alerts for various operational issues that computer systems can experience, automating the mitigation steps to resolve the alerts, recommendations & requirements to application development team to mitigate from occurrence of alert conditions, etc
Automate the critical jobs across the entire platform to minimise manual errors and human intervention
Documentation of run books, incident response, post-mortem reports, RCA, etc. with clear mitigation steps and action items
Work closely with development, QA, and operations teams to understand requirements, provide technical guidance, and facilitate seamless integration and deployment processes.
Evaluate, implement, and manage DevOps tools and technologies to enhance productivity and efficiency.
Implementation of effective monitoring for all the events and logs with right alerting / escalations for the critical alerts
Contribute to the maintenance and support of production systems as necessary
Troubleshooting of exceptions, performance issues and latencies / errors across multiple technologies.
Debugging of the code issues based on web service and API responses, errors, events, logs, etc.
Required Technical and Professional Expertise
2 to 3 years of Proven experience as a DevOps Engineer or similar role, with hands-on experience in CI/CD, infrastructure automation, and cloud services
Understanding and experience with Cloud Technologies, Microservices Architecture
Proficient in Linux, scripting, command-line tools, and general system debugging
Experience with Docker, Kubernetes and CI/CD tools such as Tekton, ArgoCD etc
Experience with CI/CD tools like Jenkins, GitLab CI, or Travis CI
Good problem-solving and analytical skills
Good communication and cross-functional collaboration skills
Have an enthusiastic, go-for-it attitude
Preferred Technical and Professional Expertise
Experience in working on large scale SaaS application