https://bayt.page.link/PNRg3xiroU29jqqKA

DevOps/Site Reliability Engineer (SRE)

- DISCOVERED
- UAE

Yesterday 2025/07/18

Complete Questionnaire

Apply on company site

Other Business Support Services

Create a job alert for similar positions

Job Description

Job description

A highly motivated and skilled DevOps / Site Reliability Engineer (SRE) is needed to build, deploy, and maintain scalable, reliable infrastructure. This role involves working closely with development teams to ensure smooth deployment pipelines, system stability, and operational efficiency.

Key Responsibilities:

Infrastructure Automation & Management

Design, implement, and maintain CI/CD pipelines to streamline development workflows.
Develop scalable infrastructure for AI model deployment and management.
Automate infrastructure provisioning and management using Terraform, Ansible, or CloudFormation.
Optimize cloud-based and on-premises resources for scalability and cost efficiency.
Manage and optimize queuing systems and real-time streaming architectures.

System Reliability & Monitoring

Monitor and troubleshoot production systems to ensure uptime and performance.
Implement robust logging and alerting solutions using Prometheus, Grafana, ELK stack, or similar tools.
Set up comprehensive monitoring for system metrics and ML model performance.
Conduct root cause analyses and post-mortems to improve system reliability.

Collaboration & Support

Work with development and QA teams to integrate new features into production seamlessly.
Advocate for best practices in system architecture, security, and performance optimization.
Provide on-call support for critical production systems as part of a rotation schedule.

Security & Compliance

Ensure infrastructure security meets compliance standards (SOC2, ISO27001).
Manage secrets and credentials securely using Vault, AWS Secrets Manager, or similar tools.

Required Qualifications

Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
Strong proficiency in at least one scripting language (Python, Bash, Go).
Hands-on experience with cloud platforms (AWS, Azure, Google Cloud).
Proficiency in containerization and orchestration (Docker, Kubernetes).
Experience with CI/CD tools (Azure DevOps, Jenkins, GitLab CI/CD, CircleCI).
Knowledge of monitoring and observability tools (Prometheus, Datadog, New Relic, Grafana, PagerDuty).
Understanding of networking concepts (DNS, load balancing, firewalls).
Familiarity with streaming architectures for real-time AI applications.

Preferred Qualifications