Role - Lead Data Engineer

Key skills required- PySpark , Kafka, GCP, SQL, Python

Experience - 7-13 Yrs

Job Location - Hyderabad / Chennai

Role Summary:

We are seeking a highly experienced and technically proficient Principal Data Engineer to play a crucial role in the design, development, and optimization of our next-generation data pipelines on Google Cloud Platform (GCP). With our migration to Dataproc, Apache Flink, Kafka, and dbt, you will be instrumental in building robust, scalable, and efficient solutions for processing our vast and complex healthcare datasets. Job will involve extensive interfacing & co-ordination with other senior tech folks (across India & US) and lead the design & development of the data ingestion framework. This role demands deep technical expertise, strong problem-solving skills, and the ability to independently drive complex data initiatives from conception to deployment.

Essential Responsibilities

Understand the overall requirement & design of the enterprise cloud migration strategy (Data)
Design, develop, and optimize high-performance data pipelines for batch and real-time data processing on GCP, working with large and complex datasets.
Implement advanced data ingestion, transformation, and loading (ETL/ELT) solutions using PySpark for large-scale data processing on Dataproc.
Build and maintain robust real-time data streaming applications using Apache Kafka
Translate complex business requirements into technical specifications and efficient data solutions, ensuring data quality, reliability, and security.
Work closely with other architects and tech leads in India & US to understand the existing data pipeline developed in Databricks & GCP
Explore designs and create POCs to address business requirements
Lead other developers, review their code and ensure quality & optimal code is being delivered
Provide regular updates on the tasks, status and risks to PM & leaders
Document technical designs, data flows, and operational procedures thoroughly.
Stay current with emerging trends and technologies in data engineering, cloud platforms, and big data.

The experience we are looking to add to our team

Required Skills

Bachelors degree or higher from a reputed university
8-12 years of hands-on experience in Data Engineering and Big Data with strong data focus roles.
Deep and proven experience with Apache Spark / PySpark, for designing, developing, and tuning large-scale data pipeline & transformation jobs.
Hands-on experience with Apache Kafka, including designing and implementing real-time data streaming solutions.
Solid understanding of distributed systems, big data technologies, and cloud-native data architectures.
Exceptional proficiency in SQL and extensive experience with advanced data warehousing concepts and various data modeling techniques.
Good hands-on experience in big data technology stack with any cloud platform preferably GCP.
Experience designing and deploying data pipelines orchestration using Apache Airflow / Cloud Composer

Good to have

Extensive hands-on experience with GCP data services, particularly Dataproc, BigQuery, Pub/Sub, and Cloud Storage.
Proven experience with Apache Flink for high-throughput, low-latency stream processing, including designing and deploying Flink applications.
Expertise in using dbt for managing complex SQL transformations, establishing data lineage, and implementing data governance practices.
Significant prior experience within the US Healthcare industry and navigating regulatory requirements (e.g., HIPAA).

Lead Data Engineer - PySpark

enGen Global

Job Description

Services you might be interested in

Improve Your Resume Today