Job Description
We are seeking an experienced DevOps Engineer with a strong background in Site Reliability Engineering to join our team. The ideal candidate will have extensive experience in managing and optimizing production environments, with a focus on deploying new releases, backup/restore processes, monitoring, and overall system health.
Responsibilities:
Manage and maintain the production environment, which includes Kubernetes, Linux, Kafka, Elastic Stack, Oracle DB, and microservices built using Java Spring Boot and Angular.
Automate the deployment of new application releases to the production environment.
Implement and maintain robust backup and restore procedures to ensure data protection and fast recovery.
Monitor the production environment, set up alerts, and investigate and resolve issues to maintain high availability and performance.
Perform regular health checks and proactively identify and address potential problems.
Collaborate with development teams to ensure smooth integration of new features and services.
Optimize infrastructure and application performance, identify bottlenecks, and implement solutions to improve efficiency.
Participate in on-call rotations and respond to incidents with a sense of urgency.
Document processes, create runbooks, and share knowledge with the team.
Continuously research and implement new technologies and best practices to enhance the platform.
Preferred Candidate
Degree
Bachelor's degree / higher diploma