We are seeking an experienced Observability SME with deep expertise in observability architectures and leading monitoring platforms. This role will be responsible for designing, implementing, and optimizing end-to-end observability solutions for applications, infrastructure, and networks. The ideal candidate will have extensive hands-on experience with platforms such as ELK (Elasticsearch, Logstash, Kibana), Dynatrace, BMC TrueSight, and SolarWinds, ensuring seamless monitoring, alerting, and analytics to enhance IT operations and service reliability.
Key Responsibilities:
• Observability Strategy & Architecture: Design and implement comprehensive observability solutions to monitor applications, infrastructure, and network performance.
• Monitoring Tool Implementation & Optimization: Deploy and fine-tune monitoring solutions using ELK, Dynatrace, BMC TrueSight, and SolarWinds.
• Log Management & Analysis: Establish centralized logging, log parsing, and correlation for improved event detection and troubleshooting.
• Metrics & Performance Monitoring: Define KPIs, dashboards, and alerts for proactive IT service monitoring.
• Incident Management & Root Cause Analysis: Collaborate with IT operations, DevOps, and SRE teams to diagnose and resolve performance issues.
• Automation & Integration: Integrate monitoring tools with ITSM platforms, AIOps solutions, and automation frameworks for enhanced efficiency.
• Capacity Planning & Optimization: Analyze historical trends and real-time data to optimize resource allocation and performance.
• Stakeholder Collaboration: Work closely with developers, network engineers, system administrators, and business units to ensure observability best practices are followed.
• Continuous Improvement: Stay updated on emerging observability technologies and recommend improvements to existing processes and tools