Data Pipeline Development: Builds understanding of the data needs of the clinet and designs, constructs, installs, tests, and maintains highly scalable data management systems using Microsoft Fabric suite (Azure Data Factory, Azure Synapse Analytics, etc.) and other relevant technologies that can efficiently and effectively meet those needs
Perform data availability assessment and implement robust processes to ensure the timely identification, collection, and validation of relevant data sets required for the client
ETL Processes: Develops ETL processes to extract data from various sources, transform the data according to business rules, and load it into a centralised data repository, ensuring data accuracy and availability.
Data Lake: Implements and manages data storage solutions using Azure OneLake and ensures optimal data storage architecture for ease of access and analysis.
Data Integration: Integrates data from various business systems into a unified data platform, enabling a consolidated view of information across the organization.
Data Quality and Governance: Ensures data accuracy and quality by implementing data governance and quality control measures, including data validation and cleansing.
Performance Optimisation: Monitors, tunes, and reports on the performance of data pipelines and databases to ensure they meet the functional and performance requirements.
Security and Compliance: Implements security measures to protect data integrity and compliance with data protection regulations and company policies.
RequirementsSkills:
Azure Data Factory
Azure Synapse Analytics
Azure Data Lake or Apache Delta Lake
Apache Spark or Databricks,
Knowledge of implementing Apache Spark using pyspark