About Us
Always open. Our code, our culture, our opportunities. Leading open innovation without limits. We are SUSE.
SUSE is a global leader in innovative, reliable and secure enterprise open source solutions, including SUSE Linux Enterprise (SLE), Rancher and NeuVector. More than 60% of the Fortune 500 rely on SUSE to power their mission-critical workloads, enabling them to innovate everywhere – from the data center to the cloud, to the edge and beyond. SUSE puts the “open” back in open source, collaborating with partners and communities to give customers the agility to tackle innovation challenges today and the freedom to evolve their strategy and solutions tomorrow.
We are open in our roots and open in our approach, striving to be the most trusted open innovator in the World. Openness extends beyond our technology. Our vibrant community thrives on diversity and connectivity without borders.
Job Description
We are seeking an accomplished Principal SaaS Observability & DevOps Engineer to join our expanding Rancher Observability team and drive the success of the SUSE Cloud Observability SaaS offering.
In this role, you will provide strategic guidance and hands-on expertise to ensure the reliability, scalability, and security of our observability platform. Leveraging your deep knowledge of Devops best practices, AWS, Kubernetes (K8s), and IaC automation, you will drive innovation and continuous improvement across our SaaS infrastructure. You will collaborate closely with cross-functional teams and stakeholders, influencing the design and evolution of our services to meet the demands of a rapidly growing, cloud-native environment.
As a Principal DevOps Engineer, you will lead by example—collaborating closely with a talented group of devops engineers to shape the foundational infrastructure that supports Kubernetes observability at scale. You will tackle complex challenges—from orchestrating robust system architectures to optimizing automation pipelines—ensuring our platform meets the highest standards of availability, performance, and security. Your ability to balance hands-on troubleshooting with strategic thinking will play a central role in enhancing the overall robustness of our platform, from deployment pipelines and monitoring systems to incident response frameworks.
In addition, you will collaborate on broader architectural decisions that impact the future of our SaaS observability solutions, influencing processes and technical roadmaps that elevate our operational excellence. Your leadership and vision will help guarantee a resilient, high-performing platform while fostering a culture of innovation and learning. If you are motivated by complex challenges, eager to shape the next generation of observability services, and excited to have a significant impact on a global product, this role offers the opportunity to do just that. You will operate in a dynamic environment that values continuous growth, collaboration, and the pursuit of technical excellence.
Position Focus Areas
Maintain, optimize and elevate the SaaS platform, including Kubernetes clusters, SaaS instances, and AWS infrastructure.
Champion Infrastructure as Code (IaC) & GitOps. Define and implement automation strategies for provisioning and managing the SaaS platform, driving best-in-class consistency, security, and compliance.
Architect Control Plane Features. Collaborate with cross-functional teams to design and implement robust Control Plane functionalities, ensuring scalability, resilience, and seamless integration with cloud-native services.
Strategic Alerting & Monitoring. Develop and refine a proactive alerting and monitoring framework to maintain optimal platform reliability, performance, and rapid incident detection and resolution.
Ensure the security and compliance of the platform by adhering to industry best practices and policies.
Manage incidents efficiently, performing root cause analysis and implementing preventative solutions.
Oversee backup strategies and implement disaster recovery plans to ensure data integrity and business continuity.
Collaborate closely with development teams to build cloud-native, scalable, and secure applications.
Experience with containers and container orchestration/management solutions, such as Kubernetes, Rancher, or Openshift, including deploying, managing, scaling, and monitoring containerized applications in production environments.
Proficiency in cloud technologies and platforms, such as AWS (ideally), GCP, or Azure, including experience with cloud services, deployment, infrastructure management, and cost optimization.
Familiarity with SRE practices such as implementing Service Level Objectives (SLOs), Service Level Indicators (SLIs), and working with error budgets to ensure the reliability and stability of the platform.
Experience with monitoring, observability, and alerting tools, such as Prometheus, Grafana, Datadog, or New Relic, including logs, metrics, traces, and dashboards to ensure system health and reliability.
Strong experience with Infrastructure as Code (IaC), using tools like Terraform, CloudFormation, or Pulumi to automate infrastructure provisioning and configuration management.
Experience with scripting and automation using languages like Python, Bash, or Golang, to automate tasks, improve operational efficiency, and develop tools for deployment and system management.
Hands-on experience with CI/CD pipelines and GitOps practices, using tools like GitLab CI/CD, ArgoCD, or Flux to automate deployments and manage infrastructure changes.
Strong understanding of Linux/Unix systems, including experience in system administration, performance tuning, and optimization in large-scale distributed systems.
Knowledge of disaster recovery and high availability strategies, including backup strategies, multi-region deployments, and automated failover.
Experience in incident management and root cause analysis (RCA), including responding to critical system alerts, managing on-call rotations, and conducting postmortem reviews to prevent future outages.
Understanding of networking concepts (DNS, HTTP, TCP/IP), including experience with load balancers, firewalls, and VPC design in cloud environments like AWS.
Familiarity with security best practices, including IAM, encryption, and compliance frameworks such as SOC 2, HIPAA, or GDPR to ensure secure and compliant cloud operations.
Experience in Golang, including experience with Golang best practices and design patterns to develop efficient, scalable, and maintainable codebases.
Strong written and verbal communication skills, with the ability to effectively communicate product architectures and design proposals.
Exhibit excellent analytical and problem-solving skills with a proactive approach to tackling complex issues.
Demonstrate a willingness and ability to learn new technologies and adapt to new challenges.
Be a great team player, promoting collaboration, teamwork, and valuing fearless feedback, both in giving and receiving.
Be productive while consistently maintaining a focus on high-quality standards
Exhibit strong leadership and mentoring skills to guide and develop team members.
Maintain a customer-centric mindset, ensuring that solutions meet end-user needs.
Embrace continuous improvement, always seeking ways to enhance processes and deliverables.
Foster a positive and inclusive work culture that encourages innovation and diversity of thought.
Job
What We Offer
We empower you to be bold, driving your career to create the future you want. We celebrate and reward your achievements.
SUSE is a dynamic environment that is evolving rapidly, thus requiring agility, strong entrepreneurship and an open mind.
This is a compelling opportunity for the right person to join us as we continue to scale and prosper.
If you’re a big thinker, obsessed by execution and thrive in a dynamic environment in which you can tangibly create a lasting legacy, then please apply now!
We give you the freedom to be yourself. You will work in a global community of unique individuals – like you – with different backgrounds, talents, skills and perspectives. A truly open community where everyone is welcome, has a voice and is encouraged to reach their full potential regardless of age, gender, race, nationality, disability, sexual orientation, religion, or any other characteristics.
Sounds like the right fit for you? Click Apply to submit your resume. A recruiter will contact you if your skills match our current or any future positions. In the meantime, stay updated on the latest SUSE news and job vacancies by joining our Talent Community.
SUSE Values
We are passionate about customers
We are respectful and inclusive
We are empowered and accountable
We are trustworthy and act with integrity
We are collaborative
We are SUSE!