About the Company

Lead AI Engineer (Agentic Systems & LLM Evaluation)

About the Role

We are hiring a Lead AI Engineer to build and scale intelligent validation systems for LLM-powered products and autonomous agents. You will design AI agents that test other AI agents, implement LLM-as-a-Judge frameworks, and build automated evaluation systems for RAG pipelines, reasoning engines, and agent workflows. If you are excited about building self-evaluating AI systems and validation agents that operate at scale, this role is for you.

Responsibilities

People Leadership & Capability Development

Lead and mentor a team of AI SDETs; build a high-performance, learning-focused engineering culture.
Drive capability growth in automation, AI evaluation frameworks, and modern quality engineering practices.
Conduct structured 1:1s, goal setting, and performance feedback to support career progression.
Partner with QA/Engineering leadership on hiring, onboarding, and workforce planning.

Agile Quality & Delivery Leadership

Embed AI quality strategy within Agile/Scrum teams; align test strategy with product risk and release readiness.
Integrate testability, acceptance criteria, evaluation baselines, and test data planning into sprint cycles.
Ensure Definition of Done includes measurable AI evaluation thresholds and guardrail validation.
Remove delivery impediments and drive cross-functional collaboration across Product, Engineering, and Security.
Maintain predictable, data-driven quality outcomes for AI-enabled features.

AI Strategy and Evaluation

Design and implement LLM-as-a-Judge evaluation frameworks for:
Output correctness
Groundedness & hallucination detection
Reasoning quality
Task completion accuracy

Build Agentic QA systems that:

Validate other agents decisions
Test tool usage accuracy
Simulate adversarial user behaviour
Perform regression evaluation autonomously

Create automated validation pipelines for:

RAG systems (retrieval scoring, faithfulness checks)
Prompt updates
Model upgrades
Develop evaluation agents using various LLM and AI Tools.
Integrate evaluation pipelines into CI/CD for continuous AI regression detection.

Define AI quality metrics such as:

Hallucination rate
Retrieval precision & recall
Judge-consistency scoring

Qualifications

Bachelors degree in computer science, Engineering, or equivalent experience
812+ years in Software Engineering, Machine Learning, Data Science, SDET / QA Automation, with strong exposure to AI systems, Generative AI, AI/ML.

Required Skills

Hands-on experience with:

LLMs (GPT, Claude, Llama, Mistral)

RAG architectures

Agent frameworks (OpenAI, Claude A2A, AutoGen, CrewAI)

Proven experience leading Engineering teams, driving technical direction, mentoring engineers, and owning quality strategy across multiple squads.
Strong Python expertise.
Experience implementing LLM evaluation or LLM-as-a-Judge systems.
Experience building scalable automation infrastructure.
Strong Communication and interpersonal skills.
Strong understanding of prompt engineering, hallucination risks, and model regression.

Preferred Skills

Experience with vector databases (Pinecone, Weaviate, FAISS).
Experience in AI observability (LangSmith, Arize, WhyLabs).
Experience building synthetic datasets for evaluation.

Lead AI Engineer

ProductSquads

Job Description

Services you might be interested in

Improve Your Resume Today