Overview

We are seeking skilled Data Engineers to join our Data & Digital Twin Foundation team. You will design, build, and maintain data pipelines that power digital twin platforms, real-time operational systems, and AI/ML workloads. Working closely with data architects, simulation engineers, and ML teams, you will transform raw operational data into high-quality, governed datasets that drive intelligent decision-making.

Our core data platform stack includes:

Data Platform & Lakehouse

Databricks (PySpark, Databricks SQL) for unified analytics and data engineering
Delta Lake for ACID-compliant lakehouse architecture
Unity Catalog for data governance, lineage, and access control

Stream & Event Processing

Apache Kafka for real-time event ingestion
Structured Streaming for continuous data processing
Delta Live Tables for declarative, quality-enforced pipelines

Specialized Data Stores

Neo4j for graph data modeling and network topology
Python and SQL for data transformation

Data Quality

Delta Live Tables expectations for data validation
Data profiling and anomaly detection

Key Responsibilities

Design, develop, and maintain scalable data pipelines using Databricks, PySpark, and Delta Lake
Build real-time and batch data ingestion pipelines from diverse operational systems
Implement data transformations that serve digital twin platforms and operational analytics
Develop and maintain graph data models in Neo4j for network topology and relationship modeling
Integrate Kafka event streams with Databricks for real-time operational state updates
Implement data quality checks using Delta Live Tables expectations
Ensure data governance compliance through Unity Catalog (lineage, access control, metadata)
Optimize pipeline performance, reliability, and cost efficiency
Write clean, well-documented, and testable code following engineering best practices
Collaborate with ML engineers to deliver feature-engineered datasets
Participate in code reviews, knowledge sharing, and continuous improvement initiatives
Support production data systems through monitoring, troubleshooting, and incident resolution

Preferred Qualifications

7+ years of hands-on data engineering experience
Track record of building and maintaining production-grade data pipelines
Experience with Delta Live Tables for declarative pipeline development
Experience working in agile, cross-functional teams
Familiarity with time-series data patterns and operational data modeling

Highly Desirable

Experience building data pipelines for digital twin or simulation platforms
Familiarity with operational state modeling for real-time systems
Exposure to physics-informed or time-series ML feature engineering
Experience working with distributed, multidisciplinary teams
Exposure to industrial domains such as Manufacturing, Logistics, or Transportation is a plus

Data Engineer

Hash Agile Technologies

Let experts apply while you prepare for interviews

Job Description

Services you might be interested in

We Search & Apply Jobs for You!