🔔 FCM Loaded

Functional AI Tester - GenAI

MICHELIN

2 - 5 years

Pune

Posted: 31/01/2026

Getting a referral is 5x more effective than applying directly

Job Description

Functional AI Tester - GenAI

- - - - - - - - - - - -

About the Role

You will be involved in QA for GenAI features including Retrieval-Augmented Generation (RAG), conversational AI and Agentic evaluations. The role centers on:

  • Systematic GenAI evaluation (qualitative and quantitative metrics)

  • ETL and data quality testing for the data flows that feed AI systems

  • Python-driven automated testing

This position is hands-on and collaborative, partnering with AI engineers, data engineers, and product teams to define measurable acceptance criteria and ship high-quality AI features.

Key Responsibilities

  • Test strategy and planning

    • Define risk-based test strategies and detailed test plans for GenAI features.

    • Establish clear acceptance criteria with stakeholders for functional, safety, and data quality aspects.

  • Python test automation

    • Build and maintain automated test suites using Python (e.g., PyTest, requests).

    • Implement reusable utilities for prompt/response validation, dataset management, and result scoring.

    • Create regression baselines and golden test sets to detect quality drift.

  • GenAI evaluation

    • Develop evaluation harnesses covering factuality, coherence, helpfulness, safety, bias, and toxicity etc.

    • Design prompt suites, scenario-based tests, and golden datasets for reproducible measurements.

    • Implement guardrail tests including prompt-injection resilience, unsafe content detection, and PII redaction checks.

    • Track quality metrics over time.

  • RAG and semantic retrieval testing

    • Verify alignment between retrieved sources and generated answers.

    • Verify adversarial tests.

    • Measure retrieval relevance, precision/recall, grounding quality, and hallucination reduction.

  • API and application testing

    • Test REST endpoints supporting GenAI features (request/response contracts, error handling, timeouts).

  • ETL and data quality validation

    • Test ingestion and transformation logic; validate schema, constraints, and field-level rules.

    • Implement data profiling, reconciliation between sources and targets, and lineage checks.

    • Verify data privacy controls, masking, and retention policies across pipelines.

  • Non-functional testing

    • Performance and load testing focused on latency, throughput, concurrency, and rate limits for LLM calls.

    • Cost-aware testing (token usage, caching effectiveness) and timeout/retry behavior validation.

    • Reliability and resilience checks including error recovery and fallback behavior.

  • Share results and insights; recommend remediation and preventive actions.

 

Required Qualifications

  • Experience

    • 5+ years in software QA, including test strategy, automation, and defect management.

    • 2+ years testing AI/ML or GenAI features, with hands-on evaluation design.

    • 4+ years testing ETL/data pipelines and data quality.

  • Technical skills

    • Python: Strong proficiency building automated tests and tooling (PyTest, requests, pydantic or similar).

    • API testing: REST contract testing, schema validation, negative testing.

    • GenAI evaluation: crafting prompt suites, golden datasets, rubric-based scoring, and automated evaluation pipelines.

    • RAG testing: retrieval relevance, grounding validation, chunking/indexing verification, and embedding checks.

    • ETL/data quality: schema and constraint validation, reconciliation, lineage awareness, data profiling.

  • Quality and governance

    • Understanding of LLM limitations and methods to detect/reduce hallucinations.

    • Safety and compliance testing including PII handling and prompt-injection resilience.

    • Strong analytical and debugging skills across services and data flows.

  • Soft skills

    • Excellent written and verbal communication; ability to translate quality goals into measurable criteria.

    • Collaboration with AI engineers, data engineers, and product stakeholders.

    • Organized, detail-oriented, and outcomes-focused.

 

Nice to Have

  • Experience with evaluation frameworks or tooling for LLMs and RAG quality measurement.

  • Experience creating synthetic datasets to stress specific behaviors.

About Company

Michelin is a global tire manufacturer known for its high-performance tires used in automobiles, trucks, and aircraft. The company is committed to sustainability, producing eco-friendly products and investing in technologies that improve fuel efficiency, safety, and environmental impact.

Services you might be interested in

Improve Your Resume Today

Boost your chances with professional resume services!

Get expert-reviewed, ATS-optimized resumes tailored for your experience level. Start your journey now.