Location:
Job ID:
Date Posted:
Company Name:
Profession (Job Category):
Job Schedule:
Remote:
Job Description:
Mission Statement:
As the AI Operations Lead, you will be responsible for ensuring the seamless operation of AI platforms, overseeing critical infrastructure, optimizing resource utilization, and managing incidents with a proactive approach. Additionally, you will play a strategic role in forecasting future AI infrastructure needs, scaling resources efficiently, and identifying opportunities to optimize costs while maintaining high performance and availability.
Your Responsibilities:
Overseeing the day-to-day operations of the AI platform infrastructure, including servers, databases, storage systems, network components, and associated software. This includes monitoring, maintenance, patching, and upgrades.
Implementing and maintaining monitoring systems to track platform performance, identify bottlenecks, and optimize resource utilization. This includes proactive performance tuning and capacity planning.
Problem management - Overview of incidents and outages, troubleshooting issues, and implementing solutions to minimize downtime. This includes root cause analysis and preventative measures.
Collaborating with security teams to implement and maintain security measures to protect the platform and data. This includes vulnerability management, access control, and data encryption.
Developing and testing disaster recovery and business continuity plans to ensure the platform can be recovered in the event of a major outage.
Forecasting future infrastructure needs based on anticipated growth and usage patterns
Identifying opportunities to reduce infrastructure costs without compromising performance or availability.
Living Hitachi Energy’s core values of safety and integrity, which means taking responsibility for your own actions while caring for your colleagues and the business.
Your Background:
At least Bachelor’s Degree in information systems.
5+ years of experience in IT operations in multinational companies.
3+ years of experience managing or leading an application or platform, preferably in an Azure-based AI/Cloud environment.
Experience with Azure AI & ML services (e.g., Azure Machine Learning, Databricks, Cognitive Services).
Ability to work with virtual delivery teams as well as geographically distributed stakeholders.
Demonstrated experience in working collaboratively in geographically dispersed team.
Proficiency in both spoken & written English language is required.