Data Engineer - Scala Spark
NielsenIQ
2 - 5 years
Chennai
Posted: 10/12/2025
Getting a referral is 5x more effective than applying directly
Job Description
Role Summary:
Design, build, and optimize large-scale ETL and data-processing pipelines handling GBTB volumes. Operate within the Databricks ecosystem and drive migration of selected workloads to high-performance engines such as Polars and DuckDB. Maintain strong engineering rigor across CI/CD, testing, and code-quality enforcement. Apply analytical thinking to solve data reliability, performance, and scalability problems. AI familiarity is advantageous.
Core Responsibilities:
- Develop and maintain distributed data pipelines using Scala, Spark, Delta, and Databricks.
- Engineer robust ETL workflows tuned for high-volume ingestion, transformation, and publishing.
- Profile pipelines, remove bottlenecks, and optimize compute, storage, and job orchestration.
- Lead migration of suitable workloads to Polars, DuckDB, or equivalent high-performance engines.
- Implement CI/CD workflows with automated builds, tests, deployments, and environment gating.
- Enforce coding standards through code coverage targets, unit/integration tests, and SonarQube rules.
- Ensure pipeline observability: logging, data quality checks, lineage, and failure diagnostics.
- Apply analytical reasoning to triage complex data issues and deliver root-cause clarity.
- Contribute to AI-aligned initiatives when required: RAG design, fine-tuning workflows, agentic patterns.
- Collaborate with product, analytics, and platform teams to operationalize data solutions
Required Skills and Experience:
- 3+ years in data engineering with strong command of Scala and Spark.
- Proven background in ETL design, distributed processing, and high-volume data systems.
- Hands-on experience with Databricks (jobs, clusters, notebooks, Delta Lake).
- Proficiency in workflow optimization, performance tuning, and memory management.
- Experience with Polars, DuckDB, or similar columnar/accelerated engines.
- CI/CD discipline using Git-based pipelines; strong testing and code-quality practices.
- Familiarity with SonarQube, coverage metrics, and static analysis.
- Strong analytical and debugging capability across data, pipelines, and infra.
- Exposure to AI concepts: embeddings, vector stores, retrieval-augmented generation, fine-tuning, agentic architectures.
Preferred :
- Experience with Azure cloud environments .
- Experience in metadata-driven or config-driven pipeline frameworks.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
