Job Description
We are seeking a skilled and experienced Data Engineer with expertise in PySpark and Azure Data Factory to join our team. The ideal candidate will have a minimum of 5 years of professional experience in data engineering and a strong understanding of data processing, transformation, and integration. Knowledge of Power BI or any other data visualization tool is highly preferred.
Key Responsibilities:
- Design, develop, and maintain scalable data pipelines using PySpark and Azure Data Factory.
- Transform and integrate large datasets from multiple sources to support analytical and business intelligence requirements.
- Collaborate with data analysts, scientists, and other stakeholders to define data requirements and deliver solutions.
- Optimize data workflows for performance, scalability, and cost efficiency.
- Monitor and troubleshoot data pipelines to ensure high availability and reliability.
- Implement best practices for data governance, security, and compliance.
- Utilize Power BI or other visualization tools to support the creation of dashboards and reports (preferred).
RequirementsRequired Skills and Qualifications:
- Experience: Minimum 5 years of hands-on experience in data engineering.
- Technical Expertise:
- Strong knowledge of PySpark for data processing and transformation.
- Proficiency in Azure Data Factory for building and managing ETL/ELT workflows.
- Familiarity with data storage solutions (e.g., Azure Data Lake, SQL databases).
- Experience with version control systems like Git.
- Visualization Skills: Knowledge of Power BI or other visualization tools is highly desirable.
- Problem-Solving: Excellent analytical and problem-solving skills to handle complex data challenges.
- Communication: Strong verbal and written communication skills to collaborate effectively with cross-functional teams.
- Preferred Skills:
- Understanding of cloud services in Azure (e.g., Azure Synapse, Logic Apps).
- Familiarity with Agile methodologies and CI/CD pipelines.
Education:Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.