https://bayt.page.link/gc6YwM4t8E5NHqCx7
Create a job alert for similar positions

Job Description

Overview • Develop and implement a comprehensive risk management strategy, identifying potential threats to application stability and performance, including cybersecurity risks, and outlining proactive mitigation strategies. • Oversee the creation and regular testing of disaster recovery plans to ensure rapid restoration of services in the event of major incidents or disasters, minimizing downtime and data loss. Responsibilities • Develop and implement a comprehensive risk management strategy, identifying potential threats to application stability and performance, including cybersecurity risks, and outlining proactive mitigation strategies. • Oversee the creation and regular testing of disaster recovery plans to ensure rapid restoration of services in the event of major incidents or disasters, minimizing downtime and data loss. • Ensure compliance with relevant legal, regulatory, and security standards, including GDPR, HIPAA, or SOC 2, to protect user data and privacy across all sustain activities and release management processes. • Manage relationships with software vendors, cloud service providers, and other third-party entities, negotiating service level agreements (SLAs) that align with the application's uptime and performance goals – which would be measured in minutes and hours. • Lead incident response efforts as Incident Commander during major outages, coordinating across teams to restore service swiftly. • Regularly review and manage the lifecycle of the technology stack, including software and hardware components, to ensure they are up-to-date, supported, and aligned with the application's long-term strategy. • Establish mechanisms to capture and analyse user feedback, integrating insights into sustain practices to enhance application usability, functionality, and satisfaction. • Foster an environment of innovation within the sustain team, encouraging the exploration and adoption of new technologies, methodologies, and practices to enhance application performance and reliability. • Act as a champion for the sustain practice within the organization, facilitating collaboration between development, operations, product management, and customer support teams to ensure a unified approach to application quality and reliability. • Oversee the budget for the sustain practice, including investments in technology, tools, and personnel. Conduct ROI analysis on sustain initiatives to ensure resources are allocated efficiently and effectively. • Lead change management efforts related to sustain practices, ensuring that changes are communicated effectively, implemented smoothly, and that the team and stakeholders are aligned with new processes and technologies. • Architect and lead the development of a comprehensive Sustain strategy and command centre for real-time monitoring and rapid incident response, ensuring the mobile application's high availability and reliability. • Spearhead the Build Release management process, optimizing deployment practices for speed, efficiency, and minimal user impact – across a wide variety of app versions, hardware variety and geographic dispersion. • Mentor and guide a specialized team in Sustain practices, Service Now operations, and release management, promoting a culture of excellence and innovation. • Implement and refine incident and problem management frameworks to proactively identify, address, and prevent system disruptions. • Collaborate with development, operations, and product teams to align strategies, streamline workflows, and ensure cohesive release planning and execution. • Design scalable infrastructure solutions and observability tools to support application reliability and operational insights. • Employ Observability tools and ML techniques to proactively monitor for performance and reliability issues and address them before they become concerns. • Establish, monitor, and report on service-level objectives (SLOs) and key performance indicators (KPIs), using data-driven insights to drive continuous improvement. • Lead incident retrospectives and preventive strategy development to mitigate future risks and enhance system resilience. • Promote best practices in software development lifecycle management, emphasizing reliability, quality assurance, and efficient deployment. • Oversee capacity planning and resource allocation to accommodate application growth while maintaining peak performance. • Champion the adoption of emerging technologies and methodologies in Sustain, Service Now, and AI/ML to maintain a competitive edge. • Cultivate an environment of continuous learning, encouraging team development and the integration of industry-leading practices. Qualifications • Bachelor’s or master’s degree in computer science, Engineering, or a related technical field. • A minimum of 10 years of experience in senior technical roles, with a significant focus on Sustain practices, command centre operations, and build release management. • Demonstrated expertise in developing and leading Sustain strategies for large-scale, critical mobile applications, with proficiency in Service Now and related technologies. • Strong foundation in software engineering principles, with hands-on experience in AI, ML, and Computer Vision technologies applied to mobile/web development. • Expertise in incident management, system monitoring, and performance tuning within high-availability environments. • Exceptional leadership skills, with a proven track record of managing specialized teams and fostering a culture of innovation and high performance. • Outstanding problem-solving abilities, with a strategic approach to identifying and implementing preventative solutions.


You have reached your limit of 15 Job Alerts. To create a new Job Alert, delete one of your existing Job Alerts first.
Similar jobs alert created successfully. You can manage alerts in settings.
Similar jobs alert disabled successfully. You can manage alerts in settings.