Role Summary:

Design, build, and optimize large-scale ETL and data-processing pipelines handling GBTB volumes. Operate within the Databricks ecosystem and drive migration of selected workloads to high-performance engines such as Polars and DuckDB. Maintain strong engineering rigor across CI/CD, testing, and code-quality enforcement. Apply analytical thinking to solve data reliability, performance, and scalability problems. AI familiarity is advantageous.

Core Responsibilities:

Develop and maintain distributed data pipelines using Scala, Spark, Delta, and Databricks.
Engineer robust ETL workflows tuned for high-volume ingestion, transformation, and publishing.
Profile pipelines, remove bottlenecks, and optimize compute, storage, and job orchestration.
Lead migration of suitable workloads to Polars, DuckDB, or equivalent high-performance engines.
Implement CI/CD workflows with automated builds, tests, deployments, and environment gating.
Enforce coding standards through code coverage targets, unit/integration tests, and SonarQube rules.
Ensure pipeline observability: logging, data quality checks, lineage, and failure diagnostics.
Apply analytical reasoning to triage complex data issues and deliver root-cause clarity.
Contribute to AI-aligned initiatives when required: RAG design, fine-tuning workflows, agentic patterns.
Collaborate with product, analytics, and platform teams to operationalize data solutions

Required Skills and Experience:

3+ years in data engineering with strong command of Scala and Spark.
Proven background in ETL design, distributed processing, and high-volume data systems.
Hands-on experience with Databricks (jobs, clusters, notebooks, Delta Lake).
Proficiency in workflow optimization, performance tuning, and memory management.
Experience with Polars, DuckDB, or similar columnar/accelerated engines.
CI/CD discipline using Git-based pipelines; strong testing and code-quality practices.
Familiarity with SonarQube, coverage metrics, and static analysis.
Strong analytical and debugging capability across data, pipelines, and infra.
Exposure to AI concepts: embeddings, vector stores, retrieval-augmented generation, fine-tuning, agentic architectures.

Preferred :

Experience with Azure cloud environments .
Experience in metadata-driven or config-driven pipeline frameworks.

Data Engineer - Scala Spark

NielsenIQ

Job Description

Services you might be interested in

We Search & Apply Jobs for You!