🔔 FCM Loaded

Data Engineer - Scala Spark

NielsenIQ

2 - 5 years

Chennai

Posted: 10/12/2025

Getting a referral is 5x more effective than applying directly

Job Description

Role Summary:


Design, build, and optimize large-scale ETL and data-processing pipelines handling GBTB volumes. Operate within the Databricks ecosystem and drive migration of selected workloads to high-performance engines such as Polars and DuckDB. Maintain strong engineering rigor across CI/CD, testing, and code-quality enforcement. Apply analytical thinking to solve data reliability, performance, and scalability problems. AI familiarity is advantageous.


Core Responsibilities:

  • Develop and maintain distributed data pipelines using Scala, Spark, Delta, and Databricks.
  • Engineer robust ETL workflows tuned for high-volume ingestion, transformation, and publishing.
  • Profile pipelines, remove bottlenecks, and optimize compute, storage, and job orchestration.
  • Lead migration of suitable workloads to Polars, DuckDB, or equivalent high-performance engines.
  • Implement CI/CD workflows with automated builds, tests, deployments, and environment gating.
  • Enforce coding standards through code coverage targets, unit/integration tests, and SonarQube rules.
  • Ensure pipeline observability: logging, data quality checks, lineage, and failure diagnostics.
  • Apply analytical reasoning to triage complex data issues and deliver root-cause clarity.
  • Contribute to AI-aligned initiatives when required: RAG design, fine-tuning workflows, agentic patterns.
  • Collaborate with product, analytics, and platform teams to operationalize data solutions


Required Skills and Experience:

  • 3+ years in data engineering with strong command of Scala and Spark.
  • Proven background in ETL design, distributed processing, and high-volume data systems.
  • Hands-on experience with Databricks (jobs, clusters, notebooks, Delta Lake).
  • Proficiency in workflow optimization, performance tuning, and memory management.
  • Experience with Polars, DuckDB, or similar columnar/accelerated engines.
  • CI/CD discipline using Git-based pipelines; strong testing and code-quality practices.
  • Familiarity with SonarQube, coverage metrics, and static analysis.
  • Strong analytical and debugging capability across data, pipelines, and infra.
  • Exposure to AI concepts: embeddings, vector stores, retrieval-augmented generation, fine-tuning, agentic architectures.

Preferred :

  • Experience with Azure cloud environments .
  • Experience in metadata-driven or config-driven pipeline frameworks.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.