Lead AI Engineer
ProductSquads
5 - 10 years
Hyderabad
Posted: 21/02/2026
Job Description
About the Company
Lead AI Engineer (Agentic Systems & LLM Evaluation)
About the Role
We are hiring a Lead AI Engineer to build and scale intelligent validation systems for LLM-powered products and autonomous agents. You will design AI agents that test other AI agents, implement LLM-as-a-Judge frameworks, and build automated evaluation systems for RAG pipelines, reasoning engines, and agent workflows. If you are excited about building self-evaluating AI systems and validation agents that operate at scale, this role is for you.
Responsibilities
People Leadership & Capability Development
- Lead and mentor a team of AI SDETs; build a high-performance, learning-focused engineering culture.
- Drive capability growth in automation, AI evaluation frameworks, and modern quality engineering practices.
- Conduct structured 1:1s, goal setting, and performance feedback to support career progression.
- Partner with QA/Engineering leadership on hiring, onboarding, and workforce planning.
Agile Quality & Delivery Leadership
- Embed AI quality strategy within Agile/Scrum teams; align test strategy with product risk and release readiness.
- Integrate testability, acceptance criteria, evaluation baselines, and test data planning into sprint cycles.
- Ensure Definition of Done includes measurable AI evaluation thresholds and guardrail validation.
- Remove delivery impediments and drive cross-functional collaboration across Product, Engineering, and Security.
- Maintain predictable, data-driven quality outcomes for AI-enabled features.
AI Strategy and Evaluation
- Design and implement LLM-as-a-Judge evaluation frameworks for:
- Output correctness
- Groundedness & hallucination detection
- Reasoning quality
- Task completion accuracy
Build Agentic QA systems that:
- Validate other agents decisions
- Test tool usage accuracy
- Simulate adversarial user behaviour
- Perform regression evaluation autonomously
Create automated validation pipelines for:
- RAG systems (retrieval scoring, faithfulness checks)
- Prompt updates
- Model upgrades
- Develop evaluation agents using various LLM and AI Tools.
- Integrate evaluation pipelines into CI/CD for continuous AI regression detection.
Define AI quality metrics such as:
- Hallucination rate
- Retrieval precision & recall
- Judge-consistency scoring
Qualifications
- Bachelors degree in computer science, Engineering, or equivalent experience
- 812+ years in Software Engineering, Machine Learning, Data Science, SDET / QA Automation, with strong exposure to AI systems, Generative AI, AI/ML.
Required Skills
- Hands-on experience with:
LLMs (GPT, Claude, Llama, Mistral)
RAG architectures
Agent frameworks (OpenAI, Claude A2A, AutoGen, CrewAI)
- Proven experience leading Engineering teams, driving technical direction, mentoring engineers, and owning quality strategy across multiple squads.
- Strong Python expertise.
- Experience implementing LLM evaluation or LLM-as-a-Judge systems.
- Experience building scalable automation infrastructure.
- Strong Communication and interpersonal skills.
- Strong understanding of prompt engineering, hallucination risks, and model regression.
Preferred Skills
- Experience with vector databases (Pinecone, Weaviate, FAISS).
- Experience in AI observability (LangSmith, Arize, WhyLabs).
- Experience building synthetic datasets for evaluation.
Services you might be interested in
Improve Your Resume Today
Boost your chances with professional resume services!
Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.
