Login Sign Up

Data Engineer

Alegeus

2 - 5 years

Bengaluru

Posted: 27/05/2026

Getting a referral is 5x more effective than applying directly

Job Description

Role summary

We are looking for an Expert Software Engineer to design, build, and scale our next-generation Data Platform and Data-Driven APIs. This role combines distributed data processing (Apache Spark) with platform and microservices engineering (Java) to enable reliable, scalable, and real-time data access.

You will operate at the intersection of data engineering and backend platform engineering-building systems that not only process large volumes of data but also expose that data through robust, well-designed APIs and services.

This role goes beyond implementing requirements. We expect engineers to understand business context, challenge assumptions, and take end-to-end ownership of delivering meaningful outcomes.


Key responsibilities

Data Platform Engineering

  • Design and develop scalable data pipelines using Apache Spark (batch and streaming)
  • Build and maintain data platform layers: ingestion, transformation, and serving
  • Optimize Spark jobs for performance, cost, and reliability (partitioning, skew handling, memory tuning)
  • Implement data quality, observability, and lineage frameworks
  • Contribute to data architecture decisions (Lakehouse, data mesh, storage formats, partition strategies)
  • Define and enforce data contracts and schema evolution practices


Platform APIs & Backend Engineering

  • Design and build data-driven platform APIs using Java (preferred)
  • Develop microservices that expose curated datasets for product and partner consumption
  • Implement RESTful APIs and event-driven services for real-time and near real-time data access
  • Ensure low-latency, high-availability data serving layers
  • Integrate with upstream/downstream systems, including legacy APIs where required

Cloud & Platform Integration

  • Build and deploy solutions on Azure (preferred) / AWS / GCP
  • Leverage cloud-native services for data storage, compute, and messaging
  • Work with event streaming systems (Kafka/Event Hubs) for real-time pipelines
  • Support containerized deployments and orchestration (Kubernetes) where applicable

Quality, Observability & Engineering Excellence

  • Champion unit tests across both data and service layers
  • Build automated validation frameworks for data pipelines
  • Implement end-to-end observability (metrics, logging, tracing) across pipelines and APIs
  • Drive CI/CD practices for both data and application code
  • Conduct code reviews and enforce engineering best practices

Product Mindset & Ownership

  • Engage deeply with product and business stakeholders to understand why, not just what
  • Translate business problems into scalable data and platform solutions
  • Take end-to-end ownership from design through production and support
  • Proactively identify performance bottlenecks, data issues, and system gaps

Required qualifications (Hard requirements)

  • 8+ years of software engineering experience with strong focus on data platforms and/or distributed systems
  • Hands-on expertise in Apache Spark or Scala or PySpark
  • Strong programming skills in Java (preferred) / Scala / Python
  • Experience building large-scale data pipelines (ETL/ELT)
  • Experience developing backend services or APIs (REST/microservices)
  • Deep understanding of:
  • Distributed systems (partitioning, shuffle, fault tolerance)
  • Data storage formats (Parquet, ORC, Avro)
  • Data modeling and schema evolution
  • Experience with cloud platforms (Azure/AWS/GCP)
  • Familiarity with workflow orchestration tools (Airflow, Dagster, etc.)
  • Strong system design and performance optimization skills


Preferred qualifications

  • Experience with Spark Structured Streaming
  • Exposure to Lakehouse architectures (Delta Lake, Iceberg, Hudi)
  • Experience with event-driven architectures (Kafka, Event Hubs)
  • Knowledge of data governance, catalog, and lineage tools
  • Experience with CI/CD for data and microservices
  • Familiarity with Kubernetes and containerized workloads
  • Experience designing low-latency data serving APIs


What success looks like

A successful engineer in this role will:

  • Deliver high-quality, production-grade data pipelines and APIs that power real business outcomes
  • Build systems that are scalable, observable, and resilient underload
  • Take ownership end-to-end, ensuring data flows reliably from source to consumer
  • Balance data correctness, performance, and cost efficiency
  • Contribute to evolving a modern data platform integrated with product-facing services

Services you might be interested in

We Search & Apply Jobs for You!

Our team scans through 1000s of opportunities and applies to roles best suited to your profile

Save 100+ hours and focus on what matters - cracking interviews and landing offers.