https://bayt.page.link/hWb68iWqmCVoJFS7A
العودة إلى نتائج البحث‎
الاستشارات الهندسية العامة
أنشئ تنبيهًا وظيفيًا للوظائف المشابهة

الوصف الوظيفي

Looking for challenging role? If you really want to make a difference - make it with us


Can we energize society and fight climate change at the same time?


At Siemens Energy, we can. Our technology is key, but our people make the difference. Brilliant minds innovate. They connect, create, and keep us on track towards changing the world’s energy systems. Their spirit fuels our mission.


Our culture is defined by caring, agile, respectful, and accountable individuals. We value excellence of any kind. Sounds like you?


We are seeking a highly skilled and driven Senior AI Engineer to join our team as a founding member, developing the critical data and AI infrastructure for training foundation models for power grid applications. You will be instrumental in building and optimizing the end-to-end systems, data pipelines, and training processes that will power our AI research. Working closely with research scientists, you will translate cutting-edge research into robust, scalable, and efficient implementations, enabling the rapid development and deployment of transformational AI solutions. This role requires deep hands-on expertise in distributed training, data engineering, MLOps, a proven track record of building scalable AI infrastructure.


Your new role – challenging and future-oriented:


  • Design, build, and rigorously optimize everything necessary for large-scale training, fine-tuning and/or inference with different model architectures. Includes the complete stack from dataloading to distributed training to inference; to maximize the MFU (Model Flop Utilization) on the compute cluster.
  • Collaborate closely and proactively with research scientists, translating research models and algorithms into high-performance, production-ready code and infrastructure. Ability to implement, integrate & test latest advancements from research publications or open-source code.
  • Relentlessly profile and resolve training performance bottlenecks, optimizing every layer of the training stack from data loading to model inference for speed and efficiency.
  • Contribute to technology evaluations and selection of hardware, software, and cloud services that will define our AI infrastructure platform.
  • Experience with MLOps frameworks (MLFlow, WnB, etc) to implement best practices across the model lifecycle – development, training, validation, and monitoring – ensuring reproducibility, reliability, and continuous improvement.
  • Create thorough documentation for infrastructure, data pipelines, and training procedures, ensuring maintainability and knowledge transfer within the growing AI lab.
  • Stay at the forefront of advancements in large-scale training strategies and data engineering and proactively driving improvements and innovation in our workflows and infrastructure.
  • High-agency individual demonstrating initiative, problem-solving, and a commitment to delivering robust and scalable solutions for rapid prototyping and turnaround.

We don’t need superheroes, just super minds:


  • Bachelor's or master’s degree in computer science, Engineering, or a related technical field.
  • 5+ years of hands-on experience in a role specifically building and optimizing infrastructure for large-scale machine learning systems
  • Deep practical expertise with AI frameworks (PyTorch, Jax, Pytorch Lightning, etc). Hands-on experience with large-scale multi-node GPU training, and other optimization strategies for developing large foundation models, across various model architectures. Ability to scale solutions involving large datasets and complex models on distributed compute infrastructure.
  • Excellent problem-solving, debugging, and performance optimization skills, with a data-driven approach to identifying and resolving technical challenges.
  • Strong communication and teamwork skills, with a collaborative approach to working with research scientists and other engineers.
  • Experience with MLOps best practices for model tracking, evaluation and deployment.

Desired skills


  • Public GitHub profile demonstrating a track record of open-source contributions to relevant projects in data engineering or deep learning infrastructure is a BIG PLUS.
  • Experience with performance monitoring and profiling tools for distributed training and data pipelines.
  • Experience writing CUDA/Triton/CUTLASS kernels.




لقد تجاوزت الحد الأقصى لعدد التنبيهات الوظيفية المسموح بإضافتها والذي يبلغ 15. يرجى حذف إحدى التنبيهات الوظيفية الحالية لإضافة تنبيه جديد
تم إنشاء تنبيه للوظائف المماثلة بنجاح. يمكنك إدارة التنبيهات عبر الذهاب إلى الإعدادات.
تم إلغاء تفعيل تنبيه الوظائف المماثلة بنجاح. يمكنك إدارة التنبيهات عبر الذهاب إلى الإعدادات.