Overview • Develop and implement a comprehensive risk management strategy, identifying potential threats to application stability and performance, including cybersecurity risks, and outlining proactive mitigation strategies. • Oversee the creation and regular testing of disaster recovery plans to ensure rapid restoration of services in the event of major incidents or disasters, minimizing downtime and data loss. Responsibilities • Develop and implement a comprehensive risk management strategy, identifying potential threats to application stability and performance, including cybersecurity risks, and outlining proactive mitigation strategies. • Oversee the creation and regular testing of disaster recovery plans to ensure rapid restoration of services in the event of major incidents or disasters, minimizing downtime and data loss. • Ensure compliance with relevant legal, regulatory, and security standards, including GDPR, HIPAA, or SOC 2, to protect user data and privacy across all sustain activities and release management processes. • Manage relationships with software vendors, cloud service providers, and other third-party entities, negotiating service level agreements (SLAs) that align with the application's uptime and performance goals – which would be measured in minutes and hours. • Lead incident response efforts as Incident Commander during major outages, coordinating across teams to restore service swiftly. • Regularly review and manage the lifecycle of the technology stack, including software and hardware components, to ensure they are up-to-date, supported, and aligned with the application's long-term strategy. • Establish mechanisms to capture and analyse user feedback, integrating insights into sustain practices to enhance application usability, functionality, and satisfaction. • Foster an environment of innovation within the sustain team, encouraging the exploration and adoption of new technologies, methodologies, and practices to enhance application performance and reliability. • Act as a champion for the sustain practice within the organization, facilitating collaboration between development, operations, product management, and customer support teams to ensure a unified approach to application quality and reliability. • Oversee the budget for the sustain practice, including investments in technology, tools, and personnel. Conduct ROI analysis on sustain initiatives to ensure resources are allocated efficiently and effectively. • Lead change management efforts related to sustain practices, ensuring that changes are communicated effectively, implemented smoothly, and that the team and stakeholders are aligned with new processes and technologies. • Architect and lead the development of a comprehensive Sustain strategy and command centre for real-time monitoring and rapid incident response, ensuring the mobile application's high availability and reliability. • Spearhead the Build Release management process, optimizing deployment practices for speed, efficiency, and minimal user impact – across a wide variety of app versions, hardware variety and geographic dispersion. • Mentor and guide a specialized team in Sustain practices, Service Now operations, and release management, promoting a culture of excellence and innovation. • Implement and refine incident and problem management frameworks to proactively identify, address, and prevent system disruptions. • Collaborate with development, operations, and product teams to align strategies, streamline workflows, and ensure cohesive release planning and execution. • Design scalable infrastructure solutions and observability tools to support application reliability and operational insights. • Employ Observability tools and ML techniques to proactively monitor for performance and reliability issues and address them before they become concerns. • Establish, monitor, and report on service-level objectives (SLOs) and key performance indicators (KPIs), using data-driven insights to drive continuous improvement. • Lead incident retrospectives and preventive strategy development to mitigate future risks and enhance system resilience. • Promote best practices in software development lifecycle management, emphasizing reliability, quality assurance, and efficient deployment. • Oversee capacity planning and resource allocation to accommodate application growth while maintaining peak performance. • Champion the adoption of emerging technologies and methodologies in Sustain, Service Now, and AI/ML to maintain a competitive edge. • Cultivate an environment of continuous learning, encouraging team development and the integration of industry-leading practices. Qualifications • Bachelor’s or master’s degree in computer science, Engineering, or a related technical field. • A minimum of 10 years of experience in senior technical roles, with a significant focus on Sustain practices, command centre operations, and build release management. • Demonstrated expertise in developing and leading Sustain strategies for large-scale, critical mobile applications, with proficiency in Service Now and related technologies. • Strong foundation in software engineering principles, with hands-on experience in AI, ML, and Computer Vision technologies applied to mobile/web development. • Expertise in incident management, system monitoring, and performance tuning within high-availability environments. • Exceptional leadership skills, with a proven track record of managing specialized teams and fostering a culture of innovation and high performance. • Outstanding problem-solving abilities, with a strategic approach to identifying and implementing preventative solutions.