SRE hired will work as an embedded Reliability Engineer with the strategically selected engineering team. The candidate will belong to a horizontal domain called TechOps: Resilience Engineering. This position will provide a provision for the SRE to shift between multiple engineering platforms as demanded by the work, vision and/or criticality of the projects. Roles and responsibilities will include interacting with Engineering leaders, engineers, product teams, Scrum/Agile leads, Production support, business, and delivery teams.
“Just Do It” mindset teammates that believe in our shared commitment of listening with Empathy, Prioritize with Purpose, operate with a Growth Mind set and Foster Community & Trust.
This person will be reporting into Manager, Site Reliability Engineering and will be collaborating with teammates in various SRE functions across multiple geographies.
We are looking for talented and passionate full stack developers with knowledge of datacenter infrastructure and cloud platforms.
Bachelor's degree in Computer Science or Engineering, or equivalent experience
Hands-on experience with AWS cloud platform and IaC
Hands on Observability tools such as Splunk/ SignalFx/ New relic or equivalent;
Proficient knowledge of object-oriented programming combined with 5-8 years of software development experience: Java/Python/Javascript or any modern OOP language.
Basic understanding and working with Docker, Kubernetes, or other container technologies
Familiar with NoSQL & SQL strategies to ensure data storage is designed for security, reliability, availability, maintainability, and performance
Experience with CI/CD (Continuous Integration/ Continuous Delivery), including relevant experience with tools like Jenkins 2.0
Knowledgeable with GitHub (version control systems) and Jira (issue tracking / ALM tools)
Join us if you have willingness to learn new technologies, share knowledge and learn from others. You feel responsible for the success of the entire team. You are not afraid to work on challenging tasks if necessary and look for opportunities to help others, who may not be part of your team.
As a site reliability engineer, you will be focused on maximum availability, observability, reliability, security, and performance for Nike Digital Experiences.
SREs perform deep problem analysis, detect infrastructure or code defects, define, report, and create observability processes for Key Performance Indicators (KPIs), and work with product delivery teams to provide long term solutions to production issues.
Ability to observe, diagnose, and develop fixes for production issues quickly and efficiently
Ability to develop and drive real time monitoring solutions that provide visibility into site health and key performance indicators
Strong communication skills (written and verbal). They must be able to clearly articulate issues and their impact(s)
Highly confident and capable in reporting and communicating high value metrics to leadership. Deep understanding of the business landscape and how site reliability influences our consumers
Working understanding of IT service management (Incident, Problem, Change and Knowledge management)
Ability to work across teams (business and technical) to continuously analyze system performance in production, troubleshoot consumer reported issues, and proactively identify areas in need of optimization
Practical experience in managing and leading application reliability practices for consumer facing web and mobile experiences
Demonstrated negotiation and influencing skills
Passion for coaching, teaching, mentoring and learning