https://bayt.page.link/sJ9C93pBLwWqfKV9A
Create a job alert for similar positions

Job Description

Overview As an Observability Engineer, you will be responsible for designing, implementing, and maintaining observability solutions that provide actionable insights into the health and performance of our key AI Platforms, Applications and infrastructure. You will work closely with development, operations, and security teams to ensure comprehensive monitoring, logging, and alerting across our technology stack. Responsibilities Design and maintain observability frameworks, tools, and dashboards to monitor system performance, availability, and reliability. Implement metrics collection, logging, and tracing solutions using industry-standard tools such as Prometheus, Grafana, ELK stack, Jaeger and frameworks like Open Telemetry. Utilize AI and ML observability tools like WhyLabs, Uptrain, and others to ensure comprehensive coverage. Set up and configure alerts to proactively detect and respond to system issues. Work with teams to define Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for critical applications and services. Implement and manage RAG (Retrieval-Augmented Generation) assessment tools. Collaborate with development and operations teams to integrate observability solutions into the CI/CD pipeline. Ensure observability solutions are aligned with security and compliance requirements. Analyze and troubleshoot performance bottlenecks and system issues. Provide actionable insights to improve system performance and reliability. Utilize tools such as Logs GPT, Giskard, babyagi, and AutoGPT for enhanced observability. Document observability standards, practices, and procedures. Train and mentor team members on observability tools and best practices Qualifications Education and Experience: Bachelor’s or master’s degree in computer science, Artificial Intelligence, or a related field. At least 5 years of professional experience in AI and machine learning. Proven experience in developing and deploying generative AI models in a professional setting. Previous experience in a consumer goods company or a related industry is a plus. Required Skills and Qualifications: Advanced programming knowledge in Python, with a deep understanding of its libraries and frameworks. Proficiency with observability tools such as Prometheus, Grafana, ELK stack, Jaeger, WhyLabs, and Uptrain. Strong understanding of distributed systems, microservices architecture, and cloud-native technologies. Experience with containerization and orchestration tools like Docker and Kubernetes (AKS). Expertise in Azure OpenAI and Azure OpenAI Studio. Strong API development skills and experience with integrating AI models into applications. Strong problem-solving and analytical skills. Excellent communication skills to collaborate effectively with cross-functional teams. Understanding of ethical considerations in AI, focusing on transparency and fairness. Preferred Skills: Knowledge of microservices architecture and RESTful APIs. Familiarity with DevOps practices and CI/CD pipelines on Azure. Understanding of agile methodologies and ADO.

Job Details

Job Location
India
Company Industry
Other Business Support Services
Company Type
Unspecified
Employment Type
Unspecified
Monthly Salary Range
Unspecified
Number of Vacancies
Unspecified

Do you need help in adding the right mix of strong keywords to your CV?

Let our experts design a Professional CV for you.

You have reached your limit of 15 Job Alerts. To create a new Job Alert, delete one of your existing Job Alerts first.
Similar jobs alert created successfully. You can manage alerts in settings.
Similar jobs alert disabled successfully. You can manage alerts in settings.