https://bayt.page.link/8zBhFEW5rD9sAyCK8

Site Reliability Engineer - 2/3

- Antal International
- Lucknow, India

Today 2025/01/24

Attach a Cover Letter

Complete Questionnaire

Apply on company site

Create a job alert for similar positions

Job Description

Run the production environment by monitoring availability and taking a holistic view of
system health.
 Improve reliability, quality, and time-to-market of our suite of software solutions
 Be the 1st person to report the incident.
 Debug production issues across services and levels of the stack.
 Envisioning the overall solution for defined functional and non-functional requirements,
and being able to define technologies, patterns and frameworks to realise it.
 Building automated tools in Python / Java / GoLang / Ruby etc.
 Help Platform and Engineering teams gain visibility into our infrastructure.
 Lead design of software components and systems, to ensure availability, scalability,
latency, and efficiency of our services.
 Participate actively in detecting, remediating and reporting on Production incidents,
ensuring the SLAs are met and driving Problem Management for permanent remediation.
 Participate in on-call rotation to ensure coverage for planned/unplanned events.
 Perform other task like load-test & generating system health reports.
 Periodically check for all dashboards readiness.
 Engage with other Engineering organizations to implement processes, identify
improvements, and drive consistent results.
 Working with your SRE and Engineering counterparts for driving Game days, training
and other response readiness efforts.
 Participate in the 24x7 support coverage as needed Troubleshooting and problem-solving
complex issues with thorough root cause analysis on customer and SRE production
environments
 Collaborate with Service Engineering organizations to build and automate tooling,
implement best practices to observe and manage the services in production and
consistently achieve our market leading SLA.
 Improving the scalability and reliability of our systems in production.
 Evaluating, designing and implementing new system architectures.
Some specific Requirements:
 B.E./B.Tech. in Engineering, Computer Science, technical degree, or equivalent work experience
 At least 3 years of managing production infrastructure. Leading / managing a team is a huge plus.
 Experience with cloud platforms like - AWS, GCP.
 Experience developing and operating large scale distributed systems with Kubernetes, Docker
and and Serverless (Lambdas)
 Experience in running real-time and low latency high available applications (Kafka, gRPC, RTP)
 Comfortable with Python, Go, or any relevant programming language.
 Experience with monitoring alerting using technologies like Newrelic / zybix /Prometheus /
Garafana / cloudwatch / Kafka / PagerDuty etc.
 Experience with one or more orchestration, deployment tools, e.g. CloudFormation / Terraform /
Ansible / Packer / Chef.
 Experience with configuration management systems such as Ansible / Chef / Puppet.
 Knowledge of load testing methodologies, tools like Gating, Apache Jmeter.
 Work your way around Unix shell.
 Experience running hybrid clouds and on-prem infrastructures on Red Hat Enterprise Linux /
CentOS
 A focus on delivering high-quality code through strong testing practice

Job Details

Job Location: Lucknow India
Company Industry: Other Business Support Services
Company Type: Recruitment Agency
Employment Type: Unspecified
Monthly Salary Range: Unspecified
Number of Vacancies: Unspecified

Apply on company site Email to Friend Add a Cover Letter Complete Questionnaire

Send Me Similar Jobs

Compare your profile with other applicants

Do you need help in adding the right mix of strong keywords to your CV?

Let our experts design a Professional CV for you.

Get Help

Cancel

You have reached your limit of 15 Job Alerts. To create a new Job Alert, delete one of your existing Job Alerts first.

MANAGE

Similar jobs alert created successfully. You can manage alerts in settings.

MANAGE

Similar jobs alert disabled successfully. You can manage alerts in settings.

MANAGE

See other jobs by
Antal International

Python Developer
Antal International
India - Indore

30+ days ago
Freight & Deployment Executive
Antal International
United Arab Emirates - Sharjah

30+ days ago
CDC Steakhouse - Signature Restaurant
Antal International
United Arab Emirates - Sharjah

30+ days ago
Container Terminal Manager
Antal International
Saudi Arabia - Medina

30+ days ago
Business Development Executive
Antal International
United Arab Emirates - Sharjah

30+ days ago

View All Jobs

Upgrade to Premium

Promote your job application to the top.

Email me jobs like this

You can cancel job alerts at any time.
By clicking "Subscribe", you accept our Terms & Conditions.

Products By Bayt.com

Use Our Mobile App

Site Reliability Engineer - 2/3

Job Description

Job Details

Do you need help in adding the right mix of strong keywords to your CV?