Job Description: Data Engineer (Lead)

Location: Pune

Experience: 5-7 Years

Job Type: Full-Time (Hybrid)

About the Role

We are looking for an experienced Lead Data Engineer to drive a large-scale ETL modernization and migration initiative. The project focuses on transforming a legacy IBM DataStage ETL platform into a modern cloud-based data architecture using Databricks and Azure Data Factory (ADF).

The goal of this migration is to significantly enhance scalability, performance, maintainability, and cloud integration by replacing traditional ETL jobs with Spark-based distributed data processing on Databricks.

The ideal candidate will play both technical leadership and hands-on development roles, ensuring smooth migration, troubleshooting, and optimization of data pipelines.

Key Responsibilities

Technical Leadership

Lead a POD of data engineers, coordinating daily activities and sprint deliverables.
Work closely with clients and stakeholders to discuss project progress, blockers, and technical solutions.
Provide technical guidance and mentorship to team members.
Participate in design discussions and architecture decisions for cloud data pipelines.

Data Engineering & Development

Design, develop, and optimize data pipelines using Databricks (Apache Spark).
Translate legacy IBM DataStage ETL pipelines into modern Databricks and ADF workflows.
Contribute to hands-on development, debugging, and troubleshooting of data pipelines.

Performance Optimization

Perform SQL performance tuning and identify query bottlenecks.
Implement optimization techniques such as:
Query optimization
Partitioning
Clustering
Efficient data processing patterns

Pipeline Execution & Debugging

Manage end-to-end pipeline execution in Databricks and Spark.
Debug failures, analyze logs, and resolve runtime issues in distributed data pipelines.

Data Quality & Issue Resolution

Investigate SIT and UAT data issues and perform root cause analysis.
Validate data lineage and transformation logic.
Ensure functional parity between legacy DataStage jobs and migrated pipelines.

ADF Orchestration

Develop and maintain Azure Data Factory pipelines for workflow orchestration.
Monitor and troubleshoot ADF pipeline failures and execution issues.

Project Tracking

Manage project tasks and progress using JIRA.
Track development status, bugs, and sprint deliverables.

Required Skills

Core Data Engineering

Strong experience in Apache Spark (Databricks)
Expertise in SQL and query performance tuning
Experience in ETL pipeline development

Cloud & Data Platforms

Hands-on experience with Azure Data Factory (ADF)
Experience working with Databricks for large-scale data processing
Understanding of cloud data architecture

Legacy ETL Understanding

Experience or familiarity with IBM DataStage for understanding legacy logic and validating migrated pipelines.

Data Debugging & Analysis

Strong experience in data validation, data lineage, and troubleshooting pipeline issues

Project & Collaboration Tools

Experience using JIRA for Agile project tracking
Experience working in SIT/UAT environments

Leadership Skills

Experience leading data engineering teams or PODs
Strong client communication and stakeholder management
Ability to guide team members and resolve technical blockers

Nice to Have

Experience in large-scale ETL migration projects
Knowledge of data lake architectures
Experience with CI/CD pipelines for data engineering
Understanding of modern data engineering best practices

Data Engineer Lead

ImmersiveData.AI

Job Description

Services you might be interested in

Improve Your Resume Today