Data Engineer Python + AI/ML Pipelines

Job Title: Data Engineer Python + AI/ML Pipelines

Company Location: Kharadi, Pune (Work-from-office / Hybrid as per company policy)

CTC: 15 LPA (including variables & statutory bonus)

Experience Required: 59 years of hands-on Data Engineering experience (candidates with 10+ years will not be considered)

Notice Period: Immediate 30 days joiners only

Key Highlights:

High-visibility role directly powering AI-driven automation projects

Modern AI/ML tech stack with heavy use of LLMs, RAG, vector DBs & real-time pipelines

Fast-growing product/SI organization with excellent learning & growth

Job Description:

We are looking for a strong Data Engineer with deep Python expertise to design, build and maintain scalable, production-grade data pipelines that fuel AI & automation initiatives for global clients.

Key Responsibilities:

Design & develop robust, scalable data pipelines for AI/ML and automation use cases

Build and optimize ETL/ELT workflows for model training, fine-tuning and inference

Ingest and integrate structured & unstructured data from multiple sources (APIs, databases, cloud storage, logs, etc.)

Collaborate closely with Data Scientists, AI Engineers & business teams to deliver clean, model-ready datasets

Implement data quality checks, validation frameworks and monitoring alerts

Support MLOps model deployment, versioning, monitoring and retraining pipelines

Work with real-time streaming (Kafka, Kinesis, Pulsar) and vector databases (Pinecone, Weaviate, Chroma, etc.)

Ensure data security, governance and compliance (GDPR, SOC2, etc.)

Continuously improve pipeline performance, cost and reliability

Must-Have Skills & Experience:

5+ years of hands-on Data Engineering experience

Expert-level Python programming (pandas, NumPy, PySpark, etc.)

Very strong SQL and database skills

Hands-on experience with at least one major cloud: Azure (preferred) / AWS / GCP

Experience with Azure Data Factory, Databricks, Synapse, or AWS Glue, EMR, etc.

Workflow orchestration tools: Airflow (must), dbt, Prefect, Dagster

Good understanding of AI/ML lifecycles and MLOps practices

Exposure to TensorFlow / PyTorch / LangChain / LlamaIndex is a big plus

Real-time data streaming (Kafka/Kinesis) and vector databases is highly desirable

Solid grasp of data modeling, warehousing and lake-house architectures

Good to Have:

Prompt engineering & RAG pipeline experience

Exposure to Kubernetes, Docker, Terraform

Contribution to open-source data/AI projects

Immediate joiners or candidates serving max 30 days notice will be given strong preference.

How to Apply

Mail your updated resume with the subject line:

Data Engineer Python AI Pipelines Pune (Your Name) (Current CTC) (Expected CTC) (Notice Period) to:

Regards,

Victor Paul

Talent Acquisition Team

Data Engineer

Live Connections

Let experts apply while you prepare for interviews

Job Description

Services you might be interested in

We Search & Apply Jobs for You!