What You Will Do:
Create Blue Green Deployment capabilities that will reduce availability risk during monolith software relates. The Engineer will also be working on creating Disaster Recovery capabilities to align with Cybersecurity OKRs, help standardize and adopt one Observability platform and finally help migrate to Terraform as a standard IaC tool.
Technical/Operational/Functional Expertise:
Develop and implement disaster recovery plans and procedures using AWS services.
Create, Maintain , and Enhance Automated Product Deployments
Develop, Modify, Support and maintain AWS/Azure based components through Infrastructure as Code and automation
Enhance availability and incident management by implementing self healing of solutions based on alerts
Continuously improve the monitoring and alerting capabilities, enabling us to be proactive instead of reactive
Conduct risk assessments and business impact analyses to identify potential threats and their impact on business operations.
Design and configure DR solutions on AWS, including backup, replication, and failover strategies utilizing services such as AWS Backup, AWS Elastic Disaster Recovery, and AWS CloudFormation.
Perform regular DR tests and simulations to ensure the effectiveness of recovery procedures.
Monitor and maintain AWS DR infrastructure, ensuring it is up-to-date and functioning properly.
Collaborate with IT and cloud teams to integrate DR plans with existing AWS infrastructure and services.
Provide training and support to staff on AWS DR procedures and best practices.
Ensure compliance with industry standards and regulatory requirements for disaster recovery on the cloud.
Document AWS DR processes and maintain detailed records of all DR activities.
Your goals will include:
Deployments management
Meeting and achieving goals for Key Performance Indicators, Service Level Agreements and Operating Level Agreements
Maintaining high levels of system uptime
Creating, Defining, Managing, Tracking and Improving processes to ensure effective services are being provided
What Skills & Experience You Should Bring
AWS certifications such as AWS Certified Solutions Architect, AWS Certified SysOps Administrator, or AWS Certified DevOps Engineer. preferred
Experience in a similar role within a large-scale enterprise environment.
Knowledge of serverless architectures and containerization (e.g., AWS Lambda, Amazon ECS/EKS).
3 to 5 years of experience with monitoring solutions (DataDog, Nagios, Newrelic)
Proficient with container technologies, like Docker, Kubernetes, ECS, EKS.
Proven experience in disaster recovery planning and implementation, specifically on AWS.
Strong knowledge of AWS services related to DR, including AWS Backup, AWS Elastic Disaster Recovery, AWS CloudFormation, AWS Lambda, and Amazon S3.
Experience with Infrastructure as Code (IaC) tools such as AWS CloudFormation or Terraform.
Familiarity with networking, security, and storage solutions on AWS.
Excellent problem-solving skills and attention to detail.
Ability to work independently and as part of a team.