We are seeking a highly experienced Lead Data Engineer to lead the design, development, and delivery of scalable, reliable, and cost-efficient data platforms. The ideal candidate will possess deep expertise in modern data engineering technologies, including distributed data processing, ETL/ELT pipeline development, data modeling, and workflow orchestration, along with hands-on experience with the Databricks Lakehouse Platform and its medallion (Bronze/Silver/Gold) architecture.

This role also requires a solid understanding of Large Language Models (LLMs) and GenAI data ecosystems, enabling the development of high-quality datasets and retrieval-ready pipelines for AI-powered applications. As a technical leader, you will mentor engineering teams, establish best practices, and collaborate closely with architects, data scientists, and business stakeholders to deliver robust data solutions.

Key Responsibilities

Lead the architecture, development, and optimization of scalable data platforms supporting both batch and streaming workloads.

Design, implement, and maintain ETL/ELT pipelines for data ingestion, transformation, and curation using the Databricks Lakehouse Platform and medallion architecture.

Define data models, schemas, and storage strategies for data lakes and data warehouses to support analytics, reporting, machine learning, and GenAI initiatives.

Develop and curate high-quality datasets, feature engineering pipelines, and retrieval-ready data stores, including embeddings and vector-based data structures, for LLM-powered applications.

Establish engineering standards, coding best practices, code review processes, and CI/CD pipelines to ensure maintainable and reliable solutions.

Build and automate end-to-end data workflows using orchestration tools such as Apache Airflow or equivalent platforms.

Lead migrations from legacy on-premises or cloud-based data warehouses to modern cloud-native and lakehouse architectures.

Optimize performance and cost by implementing effective partitioning, caching, compute tuning, and distributed processing strategies.

Implement robust data governance, security, lineage, and access control frameworks aligned with organizational compliance requirements.

Build monitoring, logging, alerting, and data quality frameworks to ensure reliability and proactive issue resolution.

Mentor and guide data engineers while collaborating with architects, data scientists, and business stakeholders to translate business requirements into scalable technical solutions.

Participate in and lead Agile ceremonies, including sprint planning, stand-ups, retrospectives, and technical reviews.

Required Qualifications

10+ years of hands-on experience in data engineering, including leadership of enterprise-scale data platform initiatives.

Strong expertise with the Databricks Lakehouse Platform, including Delta Lake, Delta Live Tables, Databricks Workflows, and Unity Catalog.

Proven experience implementing the medallion (Bronze/Silver/Gold) architecture for enterprise data platforms.

Deep knowledge of distributed data processing using Apache Spark, including PySpark and Spark SQL.

Extensive experience building scalable ETL/ELT pipelines for both batch and streaming data processing.

Expert proficiency in Python and SQL for data engineering, transformation, validation, and pipeline development.

Strong experience designing and managing data lakes and data warehouses using dimensional and lakehouse modeling techniques.

Practical understanding of Large Language Models (LLMs) and GenAI concepts, including prompts, embeddings, vector databases, Retrieval-Augmented Generation (RAG), and supporting data pipelines.

Hands-on experience with at least one major cloud platform (AWS, Azure, or GCP) and its core data services.

Demonstrated success leading migrations from legacy platforms such as Hadoop or traditional data warehouses to modern cloud and lakehouse environments.

Strong expertise in distributed computing, data partitioning, and performance optimization techniques.

Experience implementing data security, governance, lineage, encryption, IAM, and metadata management.

Solid understanding of object-oriented programming principles, software design patterns, and CI/CD practices.

Experience working within Agile/Scrum environments and mentoring engineering teams.

Excellent analytical, problem-solving, stakeholder management, and communication skills.

Data Engineer

Trantor

Let experts apply while you prepare for interviews

Job Description

Services you might be interested in

We Search & Apply Jobs for You!