https://bayt.page.link/SF7yAqP1457g9YJU8

Senior Site Reliability Engineer (SRE)

- HR Force International
- Lahore· Pakistan

Today 2025/06/29

Complete Questionnaire

Apply on company site

Other Business Support Services

Create a job alert for similar positions

Job Description

Job Overview:

We are looking for a highly skilled Senior Site Reliability Engineer (SRE) with expertise in monitoring, performance optimization, and ensuring high availability for SaaS web applications. The ideal candidate will be responsible for building, scaling, and maintaining reliable systems that can handle large traffic loads while ensuring minimal downtime. This role will focus on monitoring application performance, uptime, and reliability, working closely with engineering and DevOps teams to maintain seamless customer experiences. If you have a passion for automating reliability and scalability while maintaining the uptime of critical services, we’d love to have you on our team.

Key Responsibilities:

Monitoring and Observability:
- Design and implement monitoring solutions to ensure the health, performance, and availability of SaaS web applications and infrastructure.
- Develop and maintain dashboards, alerts, and reporting systems for proactive monitoring of application performance, user experience, and system health.
- Ensure end-to-end observability by integrating log aggregation, metrics, and tracing tools to identify and resolve issues before they impact customers.
Incident Management & Root Cause Analysis:
- Lead the response to production incidents, working with cross-functional teams to identify the root cause and implement effective remediation strategies.
- Drive post-incident reviews and document incidents, identifying areas for improvement in systems, processes, and response strategies.
- Create and enforce procedures for incident management, on-call rotations, and escalations.
Reliability & Availability:
- Collaborate with engineering and DevOps teams to implement strategies for ensuring high availability, scalability, and disaster recovery for critical services.
- Ensure systems are designed to handle high traffic loads and remain resilient to failures by building and deploying robust monitoring frameworks and automation tools.
- Focus on reducing mean time to recovery (MTTR) and increasing mean time between failures (MTBF) across the SaaS platform.
Automation & Efficiency:
- Drive automation efforts to eliminate manual intervention and improve system reliability through automated testing, deployment, and monitoring pipelines.
- Collaborate with the development team to implement changes that improve system reliability and efficiency.
Capacity Planning & Performance Tuning:
- Monitor system resource usage and identify potential capacity issues, driving proactive scaling and performance tuning initiatives.
- Use performance metrics to predict scaling needs and ensure the infrastructure can meet the growing demands of the platform.
Collaboration & Cross-Functional Engagement:
- Work closely with developers, product managers, and DevOps engineers to improve application performance and reliability through better code, infrastructure, and operational practices.
- Act as a mentor to junior SREs, sharing knowledge about best practices for monitoring, scaling, and troubleshooting complex web applications.
Continuous Improvement & Best Practices:
- Establish and promote best practices for reliability engineering, monitoring standards, incident management, and performance optimization.
- Stay current with industry trends and evaluate new tools and technologies to improve service reliability and monitoring practices.

Apply on company site Email to Friend Complete Questionnaire

Send Me Similar Jobs

Compare your profile with other applicants

Cancel

You have reached your limit of 15 Job Alerts. To create a new Job Alert, delete one of your existing Job Alerts first.

MANAGE

Similar jobs alert created successfully. You can manage alerts in settings.

MANAGE

Similar jobs alert disabled successfully. You can manage alerts in settings.

MANAGE

Products By Bayt.com

Use Our Mobile App

Senior Site Reliability Engineer (SRE)

Job Description