About the Role

Our client is seeking a Data Engineer to join their high-performance engineering team in the AdTech domain. This is a full-time, individual contributor role designed for a technically strong professional with expertise in large-scale data processing and distributed systems.

The ideal candidate will be responsible for designing and maintaining scalable ETL pipelines, managing terabyte-scale datasets, and building resilient data infrastructure. This role requires ownership of end-to-end data workflows from ingestion and transformation to optimization and performance tuning.

Key Responsibilities

Design, develop, and maintain scalable ETL pipelines on self-managed infrastructure.

Optimize data workflows to process terabytes of structured and unstructured data efficiently.

Build robust batch and stream processing solutions using Apache Spark (preferred) and/or Apache Flink.

Integrate, clean, and transform data from multiple sources ensuring seamless data flow.

Monitor, troubleshoot, and optimize data pipelines for performance, scalability, and reliability.

Ensure high standards of data quality, consistency, and governance across workflows.

Collaborate closely with data scientists, analysts, and cross-functional stakeholders to deliver actionable insights.

Participate in code reviews and contribute to high-quality, production-ready releases.

Mentor junior engineers and promote best practices in data engineering.

Required Skills

Minimum 2+ years of experience building ETL pipelines using Apache Spark and/or Apache Flink.

Strong experience handling large-scale distributed data (terabyte-scale environments).

Hands-on experience with ScyllaDB, Aerospike, or similar big data caching systems.

Strong understanding of Data Lake architectures and tools such as Delta Lake.

Proficiency in Scala, Python, or Java.

Strong knowledge of SQL, data modeling, and data warehousing concepts.

Experience working with cloud platforms such as AWS S3, Azure Data Lake, or Google BigQuery.

Experience with version control (Git) and CI/CD workflows.

Strong problem-solving skills and ability to thrive in a fast-paced environment.

Experience with Apache Kafka and Apache Airflow is a plus.

Qualifications

Bachelors degree in Computer Science, Engineering, or a related field.

Proven experience delivering high-performance data processing solutions in distributed systems.

Strong analytical mindset with attention to scalability and performance optimization.

Why Join?

Work on cutting-edge AdTech solutions processing massive datasets at scale.

Opportunity to directly impact high-performance data infrastructure and business intelligence.

Collaborative engineering culture focused on innovation and operational excellence.

Strong growth opportunities within a fast-evolving data-driven ecosystem.

About YMinds.AI

YMinds.AI is a premier talent solutions company specializing in sourcing and delivering elite professionals in cutting-edge technologies and high-growth domains. We partner with global enterprises and innovative startups to accelerate their business expansion by connecting them with top-tier talent. Our clients operate at the forefront of data and technology innovation, and we enable their success by delivering exceptional professionals who create measurable impact.

Keywords

Data Engineer, ETL Pipelines, Big Data, Apache Spark, Apache Flink, Distributed Systems, Data Lake, Delta Lake, ScyllaDB, Aerospike, Scala, Python, Java, SQL, Data Warehousing, AWS S3, Azure Data Lake, Google BigQuery, Kafka, Airflow, AdTech

#DataEngineering #BigData #ApacheSpark #ETL #HiringNow #TechHiring #AdTech #DataJob

Data Engineer (2+ years of experience) | Mumbai

YMinds.AI

Job Description

Services you might be interested in

Improve Your Resume Today