AI & ML Engineer
Blessing Softtech
2 - 5 years
Pune City
Posted: 04/04/2026
Job Description
REQUIREMENTS:
- Highly motivated, well-read, and a prodigy.
Deep experience with LLM application development prompt engineering, RAG pipelines, tool/
function calling, agent architectures
Hands-on experience with at least 2 of: OpenAI, Anthropic, Google Gemini, Groq, Mistral APIs
Strong understanding of embedding models, vector databases, and retrieval evaluation (precision,
recall, MRR, NDCG)
Experience building evaluation frameworks for AI systems not just accuracy metrics but
conversation-level quality assessment
Python proficiency with async programming (asyncio, aiohttp)
Familiarity with real-time audio/voice systems is a strong plus
Experience with LangChain/LangGraph agent patterns is a strong plus
ROLE/RESPONSIBILITIES:
Build and maintain the evaluation framework for voice and chat agent quality hallucination rate, tool selection accuracy, conversation success metrics, retrieval precision/recall, and end-to-end task
completion rates
Upgrade the RAG pipeline from basic FAISS flat index + bge-small-en-v1.5 to a production-grade
retrieval system with hybrid search (semantic + BM25), cross-encoder re-ranking, multi-document
support, chunk quality scoring, and dynamic index updates
Design and implement LLM routing intelligence choosing between 5 configured providers (OpenAI, Groq, Anthropic, Google Gemini, Mistral) based on query complexity, latency requirements, cost constraints, and tool-calling capability
Harden the guardrails system beyond current regex + Llama Guard 3: add topic boundary
enforcement, PII detection/redaction, hallucination detection on RAG responses, and output quality
scoring
Optimize voice pipeline latency end-to-end: STT TTFB, LLM TTFB, TTS TTFB, total round-trip. Profile each provider combination and tune VAD parameters (start/stop thresholds, confidence, min volume) per language
Build prompt engineering infrastructure version-controlled prompt registry, A/B testing framework for system prompts, and systematic optimization based on eval results
Develop conversation analytics: real-time sentiment tracking, intent classification, conversation
outcome scoring, topic drift detection, and customer satisfaction prediction
Implement human handoff intelligence frustration detection, repeated failure patterns, scope-
boundary detection, handoff summary generation
Tech stack you will work with
Pipecat AI (real-time voice pipeline with frame processors, VAD, barge-in)
LangChain + LangGraph (chat agent executor, tool calling, multi-agent orchestration)
FAISS + FastEmbed (vector search, local embeddings with BAAI/bge-small-en-v1.5)
Deepgram Nova-3, Google Cloud STT, AssemblyAI (speech-to-text)
ElevenLabs, Cartesia Sonic-3, Google TTS, Deepgram TTS (text-to-speech)
OpenAI GPT-4o, Groq Llama 3.3-70B, Anthropic Claude, Google Gemini 2.0 Flash, Mistral Small
(LLMs)
Llama Guard 3 via Groq (content safety), confusables library (homoglyph detection)
MCP (Model Context Protocol) via Pipedream for external tool integration
Compensation:
CTC 10L ++
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
