JOB PURPOSE:
The Senior APM Engineer is responsible for leading the end-to-end design, deployment, and optimization of Dynatrace and other APM solutions, ensuring robust observability across applications, infrastructure, and business services. The role requires deep expertise in APM architecture, integrations, automation, and performance optimization.
ROLE AND RESPONSIBILITIES
• Architect, deploy, and optimize Dynatrace solutions for comprehensive application and infrastructure monitoring.
• Implement and manage Dynatrace Cluster components for high-availability on-premise deployments.
• Develop advanced alerting strategies utilizing AI-powered anomaly detection and root cause analysis.
• Design and build custom dashboards to provide actionable insights for IT and business stakeholders.
• Integrate Dynatrace with ITSM platforms (ServiceNow, BMC Helix, Remedy) and automation frameworks to streamline incident response.
• Leverage Dynatrace APIs to automate monitoring workflows, data extraction, and intelligent alerting.
• Extend observability capabilities to mainframes, databases, and network appliances as required.
• Diagnose and resolve complex performance issues, agent deployment challenges, and integration conflicts.
• Mentor junior engineers, fostering best practices in APM implementation, monitoring strategies, and automation.
• Provide subject matter expertise in Synthetic Monitoring, OpenTelemetry, and Cloud Observability integrations.
• Contribute to APM governance frameworks, compliance standards, and enterprise observability strategy.
• Deploy, configure, and manage Dynatrace Managed (On-Premise) environments, ensuring seamless integration with enterprise infrastructure.
• Develop and optimize Dynatrace dashboards tailored for performance analysis, IT operations, and executive-level reporting.
• Automate Dynatrace workflows using APIs, webhooks, and scripting (Bash/Python) for seamless observability operations.
• Troubleshoot complex observability issues, including multi-tier application performance degradation, database slowdowns, and network anomalies.