Role summary

We are looking for an Expert Software Engineer to design, build, and scale our next-generation Data Platform and Data-Driven APIs. This role combines distributed data processing (Apache Spark) with platform and microservices engineering (Java) to enable reliable, scalable, and real-time data access.

You will operate at the intersection of data engineering and backend platform engineering-building systems that not only process large volumes of data but also expose that data through robust, well-designed APIs and services.

This role goes beyond implementing requirements. We expect engineers to understand business context, challenge assumptions, and take end-to-end ownership of delivering meaningful outcomes.

Key responsibilities

Data Platform Engineering

Design and develop scalable data pipelines using Apache Spark (batch and streaming)
Build and maintain data platform layers: ingestion, transformation, and serving
Optimize Spark jobs for performance, cost, and reliability (partitioning, skew handling, memory tuning)
Implement data quality, observability, and lineage frameworks
Contribute to data architecture decisions (Lakehouse, data mesh, storage formats, partition strategies)
Define and enforce data contracts and schema evolution practices

Platform APIs & Backend Engineering

Design and build data-driven platform APIs using Java (preferred)
Develop microservices that expose curated datasets for product and partner consumption
Implement RESTful APIs and event-driven services for real-time and near real-time data access
Ensure low-latency, high-availability data serving layers
Integrate with upstream/downstream systems, including legacy APIs where required

Cloud & Platform Integration

Build and deploy solutions on Azure (preferred) / AWS / GCP
Leverage cloud-native services for data storage, compute, and messaging
Work with event streaming systems (Kafka/Event Hubs) for real-time pipelines
Support containerized deployments and orchestration (Kubernetes) where applicable

Quality, Observability & Engineering Excellence

Champion unit tests across both data and service layers
Build automated validation frameworks for data pipelines
Implement end-to-end observability (metrics, logging, tracing) across pipelines and APIs
Drive CI/CD practices for both data and application code
Conduct code reviews and enforce engineering best practices

Product Mindset & Ownership

Engage deeply with product and business stakeholders to understand why, not just what
Translate business problems into scalable data and platform solutions
Take end-to-end ownership from design through production and support
Proactively identify performance bottlenecks, data issues, and system gaps

Required qualifications (Hard requirements)

8+ years of software engineering experience with strong focus on data platforms and/or distributed systems
Hands-on expertise in Apache Spark or Scala or PySpark
Strong programming skills in Java (preferred) / Scala / Python
Experience building large-scale data pipelines (ETL/ELT)
Experience developing backend services or APIs (REST/microservices)
Deep understanding of:
Distributed systems (partitioning, shuffle, fault tolerance)
Data storage formats (Parquet, ORC, Avro)
Data modeling and schema evolution
Experience with cloud platforms (Azure/AWS/GCP)
Familiarity with workflow orchestration tools (Airflow, Dagster, etc.)
Strong system design and performance optimization skills

Preferred qualifications

Experience with Spark Structured Streaming
Exposure to Lakehouse architectures (Delta Lake, Iceberg, Hudi)
Experience with event-driven architectures (Kafka, Event Hubs)
Knowledge of data governance, catalog, and lineage tools
Experience with CI/CD for data and microservices
Familiarity with Kubernetes and containerized workloads
Experience designing low-latency data serving APIs

What success looks like

A successful engineer in this role will:

Deliver high-quality, production-grade data pipelines and APIs that power real business outcomes
Build systems that are scalable, observable, and resilient underload
Take ownership end-to-end, ensuring data flows reliably from source to consumer
Balance data correctness, performance, and cost efficiency
Contribute to evolving a modern data platform integrated with product-facing services

Data Engineer

Alegeus

Job Description

Services you might be interested in

We Search & Apply Jobs for You!