NVIDIA is looking for a Senior Cloud Infrastructure Engineer to work in IPP's (Infrastructure, Planning and Process) Cloud Infrastructure Team. IPP is a global organization within NVIDIA. This group works with various other groups within NVIDIA such as Graphics Processors, Mobile Processors, Deep Learning, Artificial Intelligence and Driverless Cars to cater to their infrastructure needs. These cloud services provide almost half a million automated jobs per day on thousands of servers helping with the productivity of thousands of NVIDIA's software engineers worldwide. The cloud hosts a heterogeneous mix of machines and devices with various operating systems (Windows/Linux/Android), a multitude of hardware platforms both NVIDIA GPUs and Tegra Processors. Are you passionate about distributed infrastructure and looking for sophisticated, critical issues, ready to build the next generation of cloud services, design creative solutions, mine through data to uncover real problems and fix them? We are excited to onboard a fun-loving person like you.
What you'll be doing:
Design, implement, and maintain Software Defined Networking solutions for IPPs Internal Cloud Services based on openstack and kubernetes.
Working with IPPs internal cloud teams to understand their current networking designs and propose implementations for secured network access.
Liaison with Nvidia Enterprise Networking to Architect and manage DMZs within data center environments to ensure secure data flow between external and internal networks.
Collaborate with security teams to design and enforce security policies for DMZs and overall network architecture used within IPPs infrastructure.
Monitor network traffic, analyze performance metrics, and implement improvements as needed across IPPs Cloud Services.
Develop and maintain documentation for network configurations, procedures, and policies for IPPs Cloud Services.
Stay current with industry trends, technologies, and best practices related to SDN and network security.
Work with IT to drive RCCAs and provide support for network-related incidents, including troubleshooting and resolving issues in a timely manner.
Participate in planning and implementing network upgrades and enhancements.
Provide training and guidance to team members on SDN technologies and DMZ management.
What we need to see:
Bachelor’s degree in Computer Science, Information Technology, or a related field.
8+ years of relevant experience.
Strong track record of implementing network services in a variety of distributed computing and cloud environments and solutions like openstack or kubernetes.
Proven experience in a Network Engineering role with a focus on Software Defined Networking (SDN).
Strong understanding of network protocols, including TCP/IP, BGP, OSPF, and MPLS.
Experience with network security best practices and DMZ architecture.
Familiarity with networking hardware and software (routers, switches, firewalls).
Hands-on experience with high performance network and network optimization in highly-available, large-scale, multisite, international environments.
Hands-on background with building tooling and automation for provisioning, monitoring, and managing the network infrastructure.
Strong analytical and problem-solving skills.
Excellent communication and collaboration abilities
Ways to stand out from the crowd:
Experience managing HPC clusters using BCM, Slurm etc.
Experience with Openstack Neutron
Relevant certifications such as CCIE, CCNA, CCNP, or equivalent.
Special skills in large-scale computing and cluster computing(MPI), data center design include high speed interconnect InfiniBand, Cluster Storage and Scheduling related design and/or management experience.
Knowledge of automation tools and languages (e.g., Python, Ansible) is a plus.
Strong background on Windows & Linux administration as well as an understanding of dense datacenter design including compute, Storage and networking.