https://bayt.page.link/W5AN9dnuxgfSsrQq6

Director, Site Reliability Engineer

- Qualys, Inc
- India

30+ days ago 2025/05/11

Complete Questionnaire

Apply on company site

Other Business Support Services

Create a job alert for similar positions

Job Description

Come work at a place where innovation and teamwork come together to support the most exciting missions in the world!

Qualys’ site reliability engineering (SRE) team supports all Qualys products across all our production environments, including our 11 global multi-tenant platforms and over 90 on-premise setups. Effective incident management is a big part of our SRE efforts to minimize the disruption of an incident and restore normal business operations as quickly as possible.

We are seeking a highly motivated and talented Director , Site Reliability Engineering to lead our SRE team that works on a 24/7 rotation. In this role, you will be responsible for leading a group that responds proactively to alerts and is accountable for the efficiency and effectiveness of service delivery over the life cycle of an incident, Deployment of applications in production , automating the deployments , making the production environments very stable .

We are looking for an individual who believes in SRE principles, has a software engineering mindset, and wants to be part of an organization that is transforming itself to be more agile and nimble operationally.

Responsibilities

Ensure effective performance and 24x7 availability of all production systems.

Strong understanding of industry best practices for Site Reliability Engineering and ops automation

Proactively work to implement and improve automation of applications tasks

Knows system performance, testing, and programming; monitor, measure, and optimize system and application performance.

Work with other SRE leaders in setting the enterprise strategy for designing and developing resiliency in the application code

Working closely with Product Management and partner Sales and architect teams.

Track record of success in delivering quality products from concept to launch

Monitor alerts coming out of all Qualys platforms, and coordinate with Operations/SRE/DBRE/Engineering teams as necessary to take preventive or corrective action to resolve any incidents, with a goal to minimize MTTR.

Put in place and manage an effective on-call rotation within the team.

Work with engineering teams to set up proper monitoring and alerting thresholds across all Qualys services and applications so SRE team is focusing on key areas to stabilize the platforms .

Accountability for platform uptime SLAs.

Desired Skills

15 or more years of experience working in application support or Site Reliability Engineering.

Experience in a leadership role on a development or engineering team

Strong prior production operations experience leading a first responder incident management team for a high-traffic platform.

CI/CD pipelines to achieve the automation of software delivery process

Knowledge of the products and services regarding cloud platforms ; Strong skills to develop cloud solutions and deploy applications on cloud platforms.

Solid exposure to monitoring tools such as Prometheus, ELK, Kibana, AppDynamics, Splunk, Grafana, etc.

Very good experience on how to use Kubernetes , Jenkins , Terraform templates .

Very good experience on the capacity sizing of the applications .

Good experience in configuring and managing on-call and alerting platforms like PagerDuty, etc.

Comfortable working in a dynamic environment with ability to coordinate multiple tasks simultaneously.

Strong verbal and written communication skills are essential as are the ability to work in a disciplined manner and to remain composed under pressure.

Obtain and exhibit expert knowledge of Qualys’ infrastructure, monitoring, and its products and services

Coordinate with Incident management team to produce weekly reports and dashboards for various products to clearly showcase, backed by data, any areas of improvement that need to be taken up.

Must have a strong passion for continuous improvement.

Apply on company site Email to Friend Complete Questionnaire

Send Me Similar Jobs