Job Description - Data Engineer

Location: Noida, India

Experience: Minimum 3 Years (Relevant and Hands-on)

Work Timing: Overlap with US Eastern hours required

ABOUT THE ROLE

We are seeking a skilled Data Engineer to join the AI & ML Engineering team at a leading financial institution. This role is central to building the data foundation for an AIOps Incident Assist platform an AI-driven agent that provides first-responder intelligence to technical teams during major incidents. You will design and implement data pipelines that ingest, normalize, and correlate observability data (logs, traces, metrics) from multiple enterprise systems, creating the contextual backbone that powers intelligent recommendations and root-cause analysis.

The successful candidate will work within a highly secure, compliance-driven banking environment where security is of paramount importance. You will collaborate closely with architects and senior engineers to build production-grade pipelines that pass rigorous code reviews, security scans, and model risk management validation.

KEY RESPONSIBILITIES

Design and build data pipelines (ETL/ELT) to ingest observability data from sources including DataDog, Splunk, ServiceNow (ITSM), JIRA, Confluence, CMDB, and homegrown monitoring solutions.

Normalize, filter, and reformat raw telemetry data (logs, traces, metrics) into structured formats consumable by LLMs and ML models, ensuring the right level of granularity for each use case.

Build and maintain a curated data lake using Snowflake and/or Apache Iceberg, capturing correlated incident context across multiple data entities.

Develop metadata models and multi-variate correlation logic to enable accurate anomaly detection, latency analysis, and root-cause isolation.

Integrate data pipelines with the internal GDK and AK (Agent Acceleration Kit) SDKs, ensuring compliance with the organizations standardized AI development framework.

Implement comprehensive logging, audit trails, and traceability for all data flows and agent interactions, meeting production (P0) compliance requirements.

Package and deploy data services as containerized applications on Kubernetes across P2 (dev), P1 (staging), and P0 (production) environments.

Collaborate with the AI engineering team to define 510 initial use cases, identify required data entities, and scope the data integration work for each.

Support prototype development by simulating data pipelines in external environments and then replicating validated approaches within the clients secure framework.

REQUIRED QUALIFICATIONS

3+ years of hands-on experience as a Data Engineer building production data pipelines.

Strong proficiency in Python for data pipeline development, scripting, and automation.

Demonstrated experience with observability/operations data understanding of logs, traces, metrics, and how they interrelate in incident management workflows.

Experience with Snowflake or equivalent cloud data warehouse platforms (Redshift acceptable).

Solid understanding of ETL/ELT patterns, data normalization, and data modeling for operational/telemetry data.

Experience integrating with enterprise platforms such as ServiceNow, JIRA, Confluence, or equivalent ITSM and collaboration tools.

Familiarity with containerized deployments (Docker, Kubernetes) and CI/CD pipelines with multiple promotion stages.

Working knowledge of AWS cloud services.

Comfort working within strict security and compliance frameworks experience in financial services or regulated industries is strongly preferred.

PREFERRED QUALIFICATIONS

Experience with AIOps, SRE, or IT operations automation use cases.

Exposure to vector databases (PostgreSQL with pgvector) and embedding workflows.

Familiarity with LLM integration patterns, RAG architectures, or agentic AI frameworks.

Experience with open telemetry standards and instrumentation.

Prior experience working with internal SDK/toolkits that enforce guardrails, PII masking, and kill-switch mechanisms.

Working knowledge of ML model evaluation, experiment tracking, and model risk documentation.

Experience with GitHub Copilot or similar AI-assisted coding tools for productivity.

TOOLS & TECHNOLOGY ENVIRONMENT

Technologies

Languages

Python (primary)

Data Platform

Snowflake, Apache Iceberg, PostgreSQL (pgvector)

Cloud & Infra

AWS, Kubernetes, Docker, CI/CD multi-stage promotion

Observability

DataDog, Splunk, homegrown monitoring, OpenTelemetry

ITSM & Collaboration

ServiceNow, JIRA, Confluence, Runbooks

AI/ML SDKs

Internal GDK & AK (Agent Acceleration Kit), GPT-4/4.5/5

IDE & Tooling

VS Code, GitHub Copilot

LLM Models

GPT-4, GPT-4.5 (GPT-5 planned Q1 2025)

Data Engineer

CodeSpire Solutions

Job Description

Services you might be interested in

Improve Your Resume Today