https://bayt.page.link/iW6trfj9b3hFhmV68
Create a job alert for similar positions

Job Description

About the Team:  
You will be part of a dynamic team that is at the heart of ensuring seamless operations across our platforms. Our team thrives on collaboration, innovation, and a commitment to delivering best-in-class services. We work closely with cross-functional teams including DevOps, Infrastructure, and Application Support to ensure the highest levels of service availability and performance. 
About the Role: 
We are looking for a detail-oriented and proactive Monitoring Engineer to join our team. This role involves setting up, managing, and maintaining monitoring systems that ensure the availability, performance, and reliability of our technology stack. As a Monitoring Engineer, you will play a crucial role in identifying and addressing potential system issues before they impact users. 

Responsibilities:


  • 24x7x365 on call support (in rotation) to manage and execute on the Incident Management process.
  • Fast and effective response to service failure Alerts and Notifications from a range of systems.
  • Impact and Severity Assessments of service failures, both internal and external stakeholders.
  • Management of Bank/PG’s downtime or other services against SLA Targets.
  • Escalation of downtime within the bank/PG’s, as well as internally.
  • Accurately tracking on progress and escalations on issues & internal ticketing systems. 
  • Updating merchants/internal stakeholders on the status of any service outage, either directly by phone and email or via the ticketing tool.
  • Notifying merchants via email of any planned maintenance, either internal or Bank/PG.
  • Managing the outcomes of Reason for Outage (RFO/RCA) and Major Incident Reports (MIR) both internally and externally.\ 
  • Hands on experience on Database (SQL)
  • Hands on experience on Python, shell scripting.
  • Software Development in terms of automating repeatable Operations tasks (TOIL).
  • SRE Metrics & Monitoring Strategy (SLI, SLO, etc.).
  • Schedule and lead all continuous improvement activities, including Incident reviews, Change implementation reviews, TOIL automation candidate areas etc.
  • Based on post-incident reviews, he/she will need to optimize the Software Development Life Cycle (SDLC) to boost service reliability.
  • To ensure a seamless flow of information between teams, site reliability engineer job may require documenting the knowledge gained. 

Requirements:


  • 2-5 years of experience as an Application Support Engineer/Monitoring Engineer
  • Excellent communication skills
  • Patient & friendly attitude with excellent interpersonal skill
  • Ability to work on own initiative, working to and meeting tight deadlines
  • Flexible and able to work within a 24/7/365 shift pattern & rotational shifts
  • Advance Knowledge of SQL/Linux, Python/shell scripting or any code expertise
  • Should have knowledge of payment flow/process.
  • Must have SRE background more focused on automation. 

What we offer?


  • A positive, get-things-done workplace
  • A dynamic, constantly evolving space (change is par for the course – important you are comfortable with this)
  • An inclusive environment that ensures we listen to a diverse range of voices when making decisions.
  • Ability to learn cutting edge concepts and innovation in an agile start-up environment with a global scale
  • Access to 5000+ training courses accessible anytime/anywhere to support your growth and development (Corporate with top learning partners like Harvard, Coursera, Udacity)


You have reached your limit of 15 Job Alerts. To create a new Job Alert, delete one of your existing Job Alerts first.
Similar jobs alert created successfully. You can manage alerts in settings.
Similar jobs alert disabled successfully. You can manage alerts in settings.