🔔 FCM Loaded

Lead AI Engineer

ProductSquads

5 - 10 years

Hyderabad

Posted: 21/02/2026

Getting a referral is 5x more effective than applying directly

Job Description

About the Company



Lead AI Engineer (Agentic Systems & LLM Evaluation)



About the Role



We are hiring a Lead AI Engineer to build and scale intelligent validation systems for LLM-powered products and autonomous agents. You will design AI agents that test other AI agents, implement LLM-as-a-Judge frameworks, and build automated evaluation systems for RAG pipelines, reasoning engines, and agent workflows. If you are excited about building self-evaluating AI systems and validation agents that operate at scale, this role is for you.


Responsibilities

People Leadership & Capability Development

  • Lead and mentor a team of AI SDETs; build a high-performance, learning-focused engineering culture.
  • Drive capability growth in automation, AI evaluation frameworks, and modern quality engineering practices.
  • Conduct structured 1:1s, goal setting, and performance feedback to support career progression.
  • Partner with QA/Engineering leadership on hiring, onboarding, and workforce planning.


Agile Quality & Delivery Leadership

  • Embed AI quality strategy within Agile/Scrum teams; align test strategy with product risk and release readiness.
  • Integrate testability, acceptance criteria, evaluation baselines, and test data planning into sprint cycles.
  • Ensure Definition of Done includes measurable AI evaluation thresholds and guardrail validation.
  • Remove delivery impediments and drive cross-functional collaboration across Product, Engineering, and Security.
  • Maintain predictable, data-driven quality outcomes for AI-enabled features.


AI Strategy and Evaluation

  • Design and implement LLM-as-a-Judge evaluation frameworks for:
  • Output correctness
  • Groundedness & hallucination detection
  • Reasoning quality
  • Task completion accuracy

Build Agentic QA systems that:

  • Validate other agents decisions
  • Test tool usage accuracy
  • Simulate adversarial user behaviour
  • Perform regression evaluation autonomously

Create automated validation pipelines for:

  • RAG systems (retrieval scoring, faithfulness checks)
  • Prompt updates
  • Model upgrades
  • Develop evaluation agents using various LLM and AI Tools.
  • Integrate evaluation pipelines into CI/CD for continuous AI regression detection.

Define AI quality metrics such as:

  • Hallucination rate
  • Retrieval precision & recall
  • Judge-consistency scoring


Qualifications

  • Bachelors degree in computer science, Engineering, or equivalent experience
  • 812+ years in Software Engineering, Machine Learning, Data Science, SDET / QA Automation, with strong exposure to AI systems, Generative AI, AI/ML.


Required Skills

  • Hands-on experience with:

LLMs (GPT, Claude, Llama, Mistral)

RAG architectures

Agent frameworks (OpenAI, Claude A2A, AutoGen, CrewAI)

  • Proven experience leading Engineering teams, driving technical direction, mentoring engineers, and owning quality strategy across multiple squads.
  • Strong Python expertise.
  • Experience implementing LLM evaluation or LLM-as-a-Judge systems.
  • Experience building scalable automation infrastructure.
  • Strong Communication and interpersonal skills.
  • Strong understanding of prompt engineering, hallucination risks, and model regression.


Preferred Skills


  • Experience with vector databases (Pinecone, Weaviate, FAISS).
  • Experience in AI observability (LangSmith, Arize, WhyLabs).
  • Experience building synthetic datasets for evaluation.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.