Job Description
We have an exciting and rewarding opportunity for you to expand your skills and make a meaningful impact. Partner with an organization committed to defining the employee platform software engineer in the HR Work Force Technology.
As an Observability & Automation engineer at JPMorgan Chase within the Employee Platform, you will play a crucial role in ensuring the reliability, performance, and scalability of our cloud-based system by implementing effecting observability & automation practices and tools.
Job responsibilities
- Design and implement comprehensive observability solutions to monitor and analyze the performance and health of cloud and on-premises infrastructure, applications, and services.
- Develop custom monitoring tools, dashboards, and alerts to proactively identify and resolve issues before they impact users.
- Collaborate with cross-functional teams to define key metrics, alerts, dashboard, service level objectives (SLOs), and service level indicators (SLIs) to measure the reliability and availability of our systems.
- Implement automated testing, deployment, and provisioning processes to accelerate software delivery and ensure consistency across environments.
- Work closely with DevOps and SRE teams to integrate observability tools and practices into the CI/CD pipeline and infrastructure-as-code (IaC) workflows.
- Continuously evaluate and adopt new technologies, tools, and best practices to improve observability and automation capabilities.
- Troubleshoot complex technical issues related to performance, scalability, and reliability, and provide timely resolutions.
- Document observability and automation solutions, procedures, and configurations to facilitate knowledge sharing and enable effective collaboration.
Required qualifications, capabilities, and skills
- Formal training or certification on software engineering concepts and 3+ years applied experience
- 7+ years of experience as a DevOps Engineer, Site Reliability Engineer (SRE), or similar role with a focus on observability and automation.
- Hands-on experience with observability tools and technologies, including monitoring systems (e.g., AWS Cloudwatch, AWS X-Ray, Prometheus, Grafana, Dynatrace), logging frameworks (e.g., ELK stack, Splunk), and distributed tracing (e.g., Jaeger, Zipkin).
- Proficiency in programming/scripting languages such as Python, Bash, or Go for automation and tool development.
- Experience with infrastructure automation tools such as Terraform, Ansible, or Chef for provisioning and configuration management.
- Solid understanding of containerization technologies (e.g., Docker, Kubernetes) and microservices architectures.
- Strong proficiency in AWS and on-premises infrastructure.
- Excellent analytical, problem-solving, and communication skills.
- Ability to work effectively in a fast-paced, dynamic environment and manage multiple priorities simultaneously.
Preferred qualifications, capabilities, and skills
- Certifications in relevant areas such as AWS Certified DevOps Engineer, Certified Kubernetes Administrator (CKA), or Certified Site Reliability Engineer (SRE).
- Experience with observability platforms and solutions like Datadog, Splunk, Dynatrace, or Apica.
- Familiarity with continuous integration/continuous deployment (CI/CD) pipelines and associated tools (e.g., Jenkins, Spinnaker, GitHub ).
- Knowledge of modern software development practices, including Agile methodologies and DevOps principles.