https://bayt.page.link/Lhz2kXHfn1pxoTf7A
Create a job alert for similar positions

Job Description

Main Purpose:Main Purpose
▪Collaborate with data scientists and business stakeholders to design, develop, and maintain efficient data pipelines feeding into the organization's data lake.

Maintain the integrity and quality of the data lake, enabling accurate and actionable insights for data scientists and informed decision-making for business stakeholders.
▪Utilize extensive knowledge of data engineering and cloud technologies to enhance the organization’s data infrastructure, promoting a culture of data-driven decision-making.

Apply data engineering expertise to define and optimize data pipelines using advanced concepts to improve the efficiency and accessibility of data storage.
▪Own the development of an extensive data catalog, ensuring robust data governance and facilitating effective data access and utilization across the organization.Knowledge Skills and Abilities, Key Responsibilities:

Key Responsibilities



•Contribute to the development of scalable and performant data pipelines on Databricks, leveraging Delta Lake, Delta Live Tables (DLT), and other core Databricks components.



•Develop data lakes/warehouses designed for optimized storage, querying, and real-time updates using Delta Lake.



•Implement effective data ingestion strategies from various sources (streaming, batch, API-based), ensuring seamless integration with Databricks.



•Ensure the integrity, security, quality, and governance of data across our Databricks-centric platforms.



•Collaborate with stakeholders (data scientists, analysts, product teams) to translate business requirements into Databricks-native data solutions.



•Build and maintain ETL/ELT processes, heavily utilizing Databricks, Spark (Scala or Python), SQL, and Delta Lake for transformations.


•Experience with CI/CD and DevOps practices specifically tailored for the Databricks environment.



•Monitor and optimize the cost-efficiency of data operations on Databricks, ensuring optimal resource utilization.
•Utilize a range of Databricks tools, including the Databricks CLI and REST API, alongside Apache Spark™, to develop, manage, and optimize data engineering solutions.


Work Experience:



•5 years of overall experience & at least 3 years of relevant experience
•3 years of experience working with Azure or any cloud platform & Databricks


Skills:
• Proficiency in Spark, Delta Lake, Structured Streaming, and other Azure Databricks functionalities for sophisticated data pipeline construction.
• Strong capability in diagnosing and optimizing Spark applications and Databricks workloads, including strategic cluster sizing and configuration.
• Expertise in sharing data solutions that leverage Azure Databricks ecosystem technologies for enhanced data management and processing efficiency.
• Profound knowledge of data governance, data security, coupled with an understanding of large-scale distributed systems and cloud architecture design.
• Experience with a variety of data sources and BI tools






Key Relationships and Department Overview:

•Internal – Data Engineering Manager
•Developers across various departments, Managers of Departments in other regional hubs of Puma Energy
•External – Platform providers


You have reached your limit of 15 Job Alerts. To create a new Job Alert, delete one of your existing Job Alerts first.
Similar jobs alert created successfully. You can manage alerts in settings.
Similar jobs alert disabled successfully. You can manage alerts in settings.