Job Description
Summary:
The CloudOps Engineer is responsible for ensuring the smooth operation of cloud-based environments by overseeing incident response, monitoring, and optimization activities. This role involves managing various cloud technologies such as storage, network equipment, servers, and hardware components. The CloudOps Engineer will also ensure timely patch management, monitor system performance, and collaborate with other teams to deliver high-quality service to both internal teams and customers. The position requires a proactive approach to identifying and resolving issues while maintaining operational metrics and documentation.
Key Responsibilities:
- Manage the smooth operation of cloud environments across AWS, Azure, GCP, VMware, and other platforms.
- Monitor cloud infrastructure for incidents, performance issues, and security vulnerabilities.
- Lead and execute system patching across cloud infrastructure, ensuring patches are timely applied to mitigate security risks.
- Diagnose and resolve issues related to patch deployment and system performance post-patching.
- Ensure compliance with patch management policies by reviewing logs and generating reports on patch status.
- Collaborate with support and development teams to minimize disruptions and address any performance issues.
- Perform activities related to the installation, configuration, and management of traditional and cloud patching (Tanium, AWS, Azure, GCP, VMware).
- Analyze and review system performance, identifying bottlenecks and recommending improvements.
- Support the operation of complex cloud infrastructures by responding to alerts, troubleshooting issues, and ensuring maximum uptime.
- Participate in an on-call rotation and provide 24/7 availability to address critical operational issues.
- Maintain and update documentation regarding patching processes, schedules, compliance, and incident resolution.
- Communicate effectively with stakeholders and customers to manage expectations and build long-term relationships.
- Monitor and analyze system logs, responding to incidents and performing root cause analysis (RCA).
- Collaborate with team members to report operational metrics and key performance indicators (KPIs).
- Provide frontline support for Cloud Monitoring, including infrastructure and application management.
- Stay current on emerging technologies, security trends, and best practices to improve operational efficiency.