Job Description
We are looking for a GCP DevOps & Cloud Engineer to support AI/ML projects by designing, implementing, and managing cloud infrastructure on Google Cloud Platform (GCP). The role requires expertise in CI/CD, Kubernetes, Infrastructure as Code (IaC), security, monitoring, and optimizing AI/ML workflows. The ideal candidate will collaborate with data scientists, ML engineers, and software developers to ensure scalable and reliable deployment of AI/ML models.
Key Responsibilities:
GCP Infrastructure & Automation
• Design, deploy, and manage GCP cloud infrastructure for AI/ML applications.
• Implement Infrastructure as Code (IaC) using Terraform, Deployment Manager, or Pulumi.
• Manage and optimize AI/ML workloads using Vertex AI, AI Platform, and BigQuery ML.
CI/CD & MLOps
• Build and maintain CI/CD pipelines for AI/ML model training and deployment using Cloud Build, Jenkins, or GitHub Actions.
• Implement MLOps practices for automated model versioning, deployment, and monitoring.
• Integrate Kubeflow, TensorFlow Extended (TFX), or MLflow for model lifecycle management.
✅ Containerization & Orchestration
• Deploy AI/ML applications using Docker and orchestrate with Kubernetes (GKE).
• Optimize containerized AI workloads for scalability and cost efficiency.
Security & Compliance
• Ensure IAM policies, data encryption, and network security best practices.
• Implement audit logging, monitoring, and access control for AI/ML pipelines.
• Ensure compliance with GDPR, HIPAA, or industry security standards.
Monitoring & Optimization
• Set up monitoring with Google Cloud Operations Suite (Stackdriver), Prometheus, and Grafana.
• Optimize compute, storage, and networking costs for AI workloads.
• Implement logging and alerting strategies for AI model performance monitoring.
AI/ML Ecosystem:
• MLOps Tools: Kubeflow, MLflow, TFX, Vertex AI.
• AI Compute & Storage: AI Platform, BigQuery ML, Cloud TPU, DataFlow, Dataproc.
• Model Deployment: TensorFlow Serving, TorchServe, FastAPI for AI models.
RequirementsExperience:
• 5 years of relevant experience
Preferred Qualifications:
• Google Cloud Certifications:
• Professional Cloud DevOps Engineer
• Professional Cloud Architect
• Professional Machine Learning Engineer (Nice to have)
• Experience working with AI/ML workloads on Google Cloud.
• Strong knowledge of scaling AI models, optimizing ML training jobs, and managing feature stores.