Lead Data Engineer - PySpark
enGen Global
8 - 12 years
Hyderabad
Posted: 14/03/2026
Job Description
Role - Lead Data Engineer
Key skills required- PySpark , Kafka, GCP, SQL, Python
Experience - 7-13 Yrs
Job Location - Hyderabad / Chennai
Role Summary:
We are seeking a highly experienced and technically proficient Principal Data Engineer to play a crucial role in the design, development, and optimization of our next-generation data pipelines on Google Cloud Platform (GCP). With our migration to Dataproc, Apache Flink, Kafka, and dbt, you will be instrumental in building robust, scalable, and efficient solutions for processing our vast and complex healthcare datasets. Job will involve extensive interfacing & co-ordination with other senior tech folks (across India & US) and lead the design & development of the data ingestion framework. This role demands deep technical expertise, strong problem-solving skills, and the ability to independently drive complex data initiatives from conception to deployment.
Essential Responsibilities
- Understand the overall requirement & design of the enterprise cloud migration strategy (Data)
- Design, develop, and optimize high-performance data pipelines for batch and real-time data processing on GCP, working with large and complex datasets.
- Implement advanced data ingestion, transformation, and loading (ETL/ELT) solutions using PySpark for large-scale data processing on Dataproc.
- Build and maintain robust real-time data streaming applications using Apache Kafka
- Translate complex business requirements into technical specifications and efficient data solutions, ensuring data quality, reliability, and security.
- Work closely with other architects and tech leads in India & US to understand the existing data pipeline developed in Databricks & GCP
- Explore designs and create POCs to address business requirements
- Lead other developers, review their code and ensure quality & optimal code is being delivered
- Provide regular updates on the tasks, status and risks to PM & leaders
- Document technical designs, data flows, and operational procedures thoroughly.
- Stay current with emerging trends and technologies in data engineering, cloud platforms, and big data.
The experience we are looking to add to our team
Required Skills
- Bachelors degree or higher from a reputed university
- 8-12 years of hands-on experience in Data Engineering and Big Data with strong data focus roles.
- Deep and proven experience with Apache Spark / PySpark, for designing, developing, and tuning large-scale data pipeline & transformation jobs.
- Hands-on experience with Apache Kafka, including designing and implementing real-time data streaming solutions.
- Solid understanding of distributed systems, big data technologies, and cloud-native data architectures.
- Exceptional proficiency in SQL and extensive experience with advanced data warehousing concepts and various data modeling techniques.
- Good hands-on experience in big data technology stack with any cloud platform preferably GCP.
- Experience designing and deploying data pipelines orchestration using Apache Airflow / Cloud Composer
Good to have
- Extensive hands-on experience with GCP data services, particularly Dataproc, BigQuery, Pub/Sub, and Cloud Storage.
- Proven experience with Apache Flink for high-throughput, low-latency stream processing, including designing and deploying Flink applications.
- Expertise in using dbt for managing complex SQL transformations, establishing data lineage, and implementing data governance practices.
- Significant prior experience within the US Healthcare industry and navigating regulatory requirements (e.g., HIPAA).
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
