Introducing Open RAG-Eval: Scalable RAG Evaluation without Golden Answers

Hi everyone, I'm Offer from Victara, and today I'm excited to introduce Open Rag Eval, an innovative open-source project that's revolutionizing the way we evaluate rag pipelines without relying on golden answers or chunks.

  • 1. Offer from Victara discusses Open RAG Eval, a new open-source project for quick and scalable RAG (Retrieval, Answer Generation, and Ranking) evaluation.
  • 2. The project aims to solve the problem of requiring "golden answers" or "golden chunks" for RAG evaluation, which is not scalable.
  • 3. Open RAG Eval is research-backed, with collaboration from the University of Waterloo's Jimmy Lynn lab.
  • 4. Users start by collecting queries that are important for their RAG system; these queries can number in the tens, hundreds, or thousands.
  • 5. A RAG connector collects information from a RAG pipeline, which may include vector-based, language chain, Lama Index, and other connectors.
  • 6. Connectors generate RAG outputs, which are then evaluated using various metrics.
  • 7. Metrics are grouped into evaluators, forming the internal architecture of Open RAG Eval.
  • 8. Evaluators generate RAG evaluation files with all necessary information for assessing a RAG pipeline.
  • 9. Open RAG Eval includes several metrics that do not require golden answers:
  • * Umbrella: A retrieval metric that scores a chunk or passage's relevance to a query (0-3 scale).
  • * Auto Nuggetizer: A generation metric that creates atomic units (nuggets) and assigns vitality ratings.
  • * Citation Faithfulness: Measures the fidelity of citations in the response.
  • * Hallucination Detection: Checks if the entire response aligns with the retrieved content.
  • 10. Umbrella correlates well with human judgment, providing confidence in results even without golden chunks.
  • 11. Auto Nuggetizer has three steps: nugget creation, vitality rating assignment, and LLM judge analysis to determine if each selected nugget is fully or partially supported.
  • 12. Citation Faithfulness measures the support for citations (high fidelity, partial support, or no support).
  • 13. Hallucination Detection uses Victara's hallucination detection model to ensure the response aligns with the retrieved content.
  • 14. Open RAG Eval provides a user interface for visualizing evaluation results at openevaluation.ai.
  • 15. The UI displays queries, retrieval scores, and generation scores in an easy-to-understand format.
  • 16. Users can drag and drop RAG evaluation files onto the Open RAG Eval UI for visualization.
  • 17. Open RAG Eval is a powerful tool for optimizing and tuning RAG pipelines.
  • 18. The project is open source, allowing users to examine its inner workings and contribute connectors or other improvements.
  • 19. Transparency is a key benefit of Open RAG Eval, as the metrics are clear and understandable.
  • 20. Victara contributes connectors for vector-based, language chain, and Lama Index pipelines.
  • 21. Users can contribute connectors for their own or other preferred RAG pipelines.
  • 22. Open RAG Eval welcomes questions, issues, and pull requests related to connectors or other aspects of the project.
  • 23. The presentation concludes by thanking the audience for their attention.
  • 24. The topic was Open RAG Eval, an open-source package for optimizing and tuning RAG pipelines using transparent and scalable evaluation methods.

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!