Job Description
Eviden, part of the Atos Group, with an annual revenue of circa € 5 billion is a global leader in data-driven, trusted and sustainable digital transformation. As a next generation digital business with worldwide leading positions in digital, cloud, data, advanced computing and security, it brings deep expertise for all industries in more than 47 countries. By uniting unique high-end technologies across the full digital continuum with 47,000 world-class talents, Eviden expands the possibilities of data and technology, now and for generations to come.
What impact you can make
You will critical role in designing, building, and maintaining data pipelines and systems to process, store, and analyse data efficiently.
Role and Responsibilities
1. Data Pipeline Development
- Build and maintain scalable, reliable, and efficient ETL (Extract, Transform, Load) pipelines using Python and Airflow.
- Automate data ingestion and processing workflows from multiple sources .
2. Data Integration
- Integrate and transform data from disparate sources (e.g., APIs, third-party systems, legacy systems).
- Handle data standardization, validation, and quality assurance during integration.
3. Big Data Processing
- Utilize big data technologies like Apache Spark, and Snowflake for large-scale data processing.
- Write efficient and scalable Python scripts to process and validate the data.
4. Data Governance and Quality
- Implement data validation, cleaning, and transformation processes to ensure data accuracy and reliability.
- Enforce compliance with data governance policies and standards (e.g., GDPR, HIPAA).
5. Collaboration
- Work closely with other teams to understand data requirements.
- Collaborate with software engineers to integrate data workflows into applications.
6. Monitoring and Optimization
- Monitor the performance of data pipelines and systems.
- Debug and optimize data workflows to improve efficiency and reliability.
7. Scripting and Automation
- Develop reusable and modular Python scripts for repeated tasks.
- Automate workflows for recurring data processing jobs.
8. Documentation and Best Practices
- Document pipeline architecture.
Required Skills and Experience
- Experience in PySpark and Python Language.
- Experience in (OLAP Systems).
- Experience in SQL (should be able to write complex SQL Queries)
- Experience in Orchestration (Apache Airflow is preferred).
- Experience in Hadoop (Spark and Hive: Optimization of Spark and Hive apps).
- Knowledge in Snowflake (good to have).
- Experience in Data Quality (good to have).
- Knowledge in File Storage (S3 is good to have)
Let’s grow together.