Key Responsibilities
•Contribute to the development of scalable and performant data pipelines on Databricks, leveraging Delta Lake, Delta Live Tables (DLT), and other core Databricks components.
•Develop data lakes/warehouses designed for optimized storage, querying, and real-time updates using Delta Lake.
•Implement effective data ingestion strategies from various sources (streaming, batch, API-based), ensuring seamless integration with Databricks.
•Ensure the integrity, security, quality, and governance of data across our Databricks-centric platforms.
•Collaborate with stakeholders (data scientists, analysts, product teams) to translate business requirements into Databricks-native data solutions.
•Build and maintain ETL/ELT processes, heavily utilizing Databricks, Spark (Scala or Python), SQL, and Delta Lake for transformations.
•Experience with CI/CD and DevOps practices specifically tailored for the Databricks environment.
•Monitor and optimize the cost-efficiency of data operations on Databricks, ensuring optimal resource utilization.
•Utilize a range of Databricks tools, including the Databricks CLI and REST API, alongside Apache Spark™, to develop, manage, and optimize data engineering solutions.
Work Experience:
•5 years of overall experience & at least 3 years of relevant experience
•3 years of experience working with Azure or any cloud platform & Databricks
Skills:
• Proficiency in Spark, Delta Lake, Structured Streaming, and other Azure Databricks functionalities for sophisticated data pipeline construction.
• Strong capability in diagnosing and optimizing Spark applications and Databricks workloads, including strategic cluster sizing and configuration.
• Expertise in sharing data solutions that leverage Azure Databricks ecosystem technologies for enhanced data management and processing efficiency.
• Profound knowledge of data governance, data security, coupled with an understanding of large-scale distributed systems and cloud architecture design.
• Experience with a variety of data sources and BI tools
•Internal – Data Engineering Manager
•Developers across various departments, Managers of Departments in other regional hubs of Puma Energy
•External – Platform providers