Job Description
Company DescriptionBosch Global Software Technologies Private Limited is a 100% owned subsidiary of Robert Bosch GmbH, one of the world's leading global supplier of technology and services, offering end-to-end Engineering, IT and Business Solutions. With over 22,700 associates, it’s the largest software development center of Bosch, outside Germany, indicating that it is the Technology Powerhouse of Bosch in India with a global footprint and presence in the US, Europe and the Asia Pacific region.Job Description
As a Site Reliability Engineer (SRE), you will be responsible for ensuring the reliability, scalability, and performance of the systems necessary for the product and services for the Data Engineering Projects.
You will work closely with function developers, Architects and DevOps teams to build and maintain high-availability systems, capable of handling high workloads automate with active monitoring of the infrastructure.
As SRE you would ensure system reliability, availability for continuous deployment as part of the Agile practices in solution development. Mandatory Skills & experience in:
Experience with cloud platforms specifically Azure.
Hands on experience and proficiency in Cloud infrastructure and CI/CD frameworks for providing IaC - Terraform, ARM, YAML and cloud native containerization & deployment of Services viz. Docker, k8s, etc
Hands-on experience with large scale Azure DevOps and Azure PaaS components.
Must have tool knowledge – Argo, Terraform (CLI), Azure-CLI, KubeCtl, Flux, Helm, Argo (Events and workflows), Istio, Grafana, Kustomize, YAML based coding and debugging skills
Must have Kubernetes admin skill set, good to have knowledge about tools/extension to Kubernetes
Experience in understanding of function development of data science solutions & programming languages e.g. Python, Go
Excellent problem-solving skills and attention to detail.
Hands-on experience with architecting and development of features using u-Service application principles
Deep understanding of Service Level Objectives (SLOs), Service Level Indicators (SLIs), error budgeting and configuring KPIs for highly sophisticated services.
Experience with the ELK stack (Elasticsearch, Logstash, Kibana) and Prometheus for monitoring and logging.
Solid expertise in applying cloud security best practices through DevSecOps principles, with a deep understanding of Kubernetes (k8s) security. Preferred Skills & experience in:
Experience with DevOps, data pipelines and various messaging systems on a Cloud native setup (MS Azure)
Experience with database technologies (MongoDB, NoSQL, etc.) and cloud native optimization services
Strong working knowledge in Azure
Motivating attitude, profound communication, strong interpersonal skills, structured and analytical
Knowledge of costing, optimization techniques for large scale cloud native services. Key Responsibilities:
System Reliability: Design and engineer highly scalable and high availability systems for high throughput workloads.
Continuous monitoring & active alerting: Develop, deploy, and manage monitoring systems, setting up alerts to proactively identify and resolve issues.
Automation: Automate routine tasks such as deployments, monitoring, and policy enforcements using suitable frameworks
Performance Tuning: Optimize system performance by identifying bottlenecks and implementing appropriate solutions.
Infrastructure as Code (IaC): Utilize tools like Terraform, Ansible, or similar to manage infrastructure through code, ensuring consistency and repeatability.
Security: Understand the implement the security policy and enforcements defined by the organization for infrastructure and data
Scaling & Cost Management: Analyze system performance and plan for future scaling needs.
Issue Handling and resolution: Respond to system outages, perform root cause analysis, and implement fixes to prevent future incidents. QualificationsMaster's degree/ Bachelor Degree in Computer Science or Information Science or equivalent engineering stream.Additional Information6-8 Years of hands on experience in maintaining Large scale, High availability Data engineering solutions, services.