Job Description
Responsibilities:
- Design and Build Scalable Data Pipelines: Architect, develop, and maintain efficient, scalable, and secure data pipelines to handle large datasets across multiple data sources, ensuring reliability and performance.
- Cloud Platform Expertise: Utilize AWS and GCP services (e.g., Amazon S3, Redshift, BigQuery, Dataflow, Cloud Storage, Dataproc) to implement and optimize cloud-based data infrastructure.
- Data Integration: Integrate various data sources, both structured and unstructured, from internal systems and third-party providers, to enable cross-functional teams to access actionable insights.
- Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and provide solutions that drive insights for business intelligence, reporting, and analytics.
- Data Quality & Governance: Implement best practices for data quality, data lineage, and governance to ensure the accuracy and compliance of data across pipelines and systems.
- Optimization and Automation: Continuously optimize data workflows and automate processes to improve efficiency and reduce latency in data operations.
- Performance Tuning: Optimize data storage, retrieval, and processing performance on cloud platforms to ensure optimal cost and time efficiency.
- Security & Compliance: Ensure data privacy and security standards are maintained in alignment with company policies and industry regulations.
Requirements:
- Experience: 5+ years of hands-on experience as a data engineer, with a focus on data engineering in cloud environments (AWS and GCP).
- Cloud Technologies: Deep expertise in using AWS (e.g., S3, Redshift, Lambda, Glue) and GCP (e.g., BigQuery, Dataflow, Cloud Storage) to build and manage data pipelines and infrastructure.
- Data Engineering Skills: Strong knowledge of ETL/ELT processes, data warehousing concepts, and distributed computing frameworks (e.g., Apache Spark, Hadoop, Airflow).
- Programming: Proficiency in Python, SQL, and other relevant programming languages for data engineering.
- Database Knowledge: Experience working with both relational and NoSQL databases (e.g., PostgreSQL, MySQL, DynamoDB, MongoDB).
- Version Control & CI/CD: Familiarity with version control systems (e.g., Git) and CI/CD pipelines for automating deployments and testing.
- Data Processing Frameworks: Experience with data processing and orchestration frameworks such as Apache Airflow, Apache Kafka, or similar technologies.
- Problem Solving: Strong analytical and troubleshooting skills with the ability to resolve complex data and system issues in a timely manner.
- Media Industry Knowledge (Preferred): Familiarity with data needs and challenges within the media industry (e.g., content analytics, user behavior analysis, and media streaming data).
Preferred Qualifications:
- Certifications: AWS Certified Solutions Architect, Google Professional Data Engineer, or similar certifications in cloud technologies.
- Big Data Technologies: Experience with big data tools, Spark, or similar distributed data processing technologies.
- Machine Learning (Optional): Exposure to machine learning platforms or working with data science teams to enable ML models is a plus.
Unfortunately, due to the high number of responses we receive we are unable to provide feedback to all applicants. If you have not been contacted within 5-7 days, please assume that at this stage your application has been unsuccessful.