Data Engineer
Alegeus
2 - 5 years
Bengaluru
Posted: 27/05/2026
Job Description
Role summary
We are looking for an Expert Software Engineer to design, build, and scale our next-generation Data Platform and Data-Driven APIs. This role combines distributed data processing (Apache Spark) with platform and microservices engineering (Java) to enable reliable, scalable, and real-time data access.
You will operate at the intersection of data engineering and backend platform engineering-building systems that not only process large volumes of data but also expose that data through robust, well-designed APIs and services.
This role goes beyond implementing requirements. We expect engineers to understand business context, challenge assumptions, and take end-to-end ownership of delivering meaningful outcomes.
Key responsibilities
Data Platform Engineering
- Design and develop scalable data pipelines using Apache Spark (batch and streaming)
- Build and maintain data platform layers: ingestion, transformation, and serving
- Optimize Spark jobs for performance, cost, and reliability (partitioning, skew handling, memory tuning)
- Implement data quality, observability, and lineage frameworks
- Contribute to data architecture decisions (Lakehouse, data mesh, storage formats, partition strategies)
- Define and enforce data contracts and schema evolution practices
Platform APIs & Backend Engineering
- Design and build data-driven platform APIs using Java (preferred)
- Develop microservices that expose curated datasets for product and partner consumption
- Implement RESTful APIs and event-driven services for real-time and near real-time data access
- Ensure low-latency, high-availability data serving layers
- Integrate with upstream/downstream systems, including legacy APIs where required
Cloud & Platform Integration
- Build and deploy solutions on Azure (preferred) / AWS / GCP
- Leverage cloud-native services for data storage, compute, and messaging
- Work with event streaming systems (Kafka/Event Hubs) for real-time pipelines
- Support containerized deployments and orchestration (Kubernetes) where applicable
Quality, Observability & Engineering Excellence
- Champion unit tests across both data and service layers
- Build automated validation frameworks for data pipelines
- Implement end-to-end observability (metrics, logging, tracing) across pipelines and APIs
- Drive CI/CD practices for both data and application code
- Conduct code reviews and enforce engineering best practices
Product Mindset & Ownership
- Engage deeply with product and business stakeholders to understand why, not just what
- Translate business problems into scalable data and platform solutions
- Take end-to-end ownership from design through production and support
- Proactively identify performance bottlenecks, data issues, and system gaps
Required qualifications (Hard requirements)
- 8+ years of software engineering experience with strong focus on data platforms and/or distributed systems
- Hands-on expertise in Apache Spark or Scala or PySpark
- Strong programming skills in Java (preferred) / Scala / Python
- Experience building large-scale data pipelines (ETL/ELT)
- Experience developing backend services or APIs (REST/microservices)
- Deep understanding of:
- Distributed systems (partitioning, shuffle, fault tolerance)
- Data storage formats (Parquet, ORC, Avro)
- Data modeling and schema evolution
- Experience with cloud platforms (Azure/AWS/GCP)
- Familiarity with workflow orchestration tools (Airflow, Dagster, etc.)
- Strong system design and performance optimization skills
Preferred qualifications
- Experience with Spark Structured Streaming
- Exposure to Lakehouse architectures (Delta Lake, Iceberg, Hudi)
- Experience with event-driven architectures (Kafka, Event Hubs)
- Knowledge of data governance, catalog, and lineage tools
- Experience with CI/CD for data and microservices
- Familiarity with Kubernetes and containerized workloads
- Experience designing low-latency data serving APIs
What success looks like
A successful engineer in this role will:
- Deliver high-quality, production-grade data pipelines and APIs that power real business outcomes
- Build systems that are scalable, observable, and resilient underload
- Take ownership end-to-end, ensuring data flows reliably from source to consumer
- Balance data correctness, performance, and cost efficiency
- Contribute to evolving a modern data platform integrated with product-facing services
Services you might be interested in
We Search & Apply Jobs for You!
Our team scans through 1000s of opportunities and applies to roles best suited to your profile
Save 100+ hours and focus on what matters - cracking interviews and landing offers.
