Introducing Open RAG-Eval: Scalable RAG Evaluation without Golden Answers

Hi everyone, I'm Offer from Victara, and today I'm excited to introduce Open Rag Eval, an innovative open-source project that's revolutionizing the way we evaluate rag pipelines without relying on golden answers or chunks.

1. Offer from Victara discusses Open RAG Eval, a new open-source project for quick and scalable RAG (Retrieval, Answer Generation, and Ranking) evaluation.
2. The project aims to solve the problem of requiring "golden answers" or "golden chunks" for RAG evaluation, which is not scalable.
3. Open RAG Eval is research-backed, with collaboration from the University of Waterloo's Jimmy Lynn lab.
4. Users start by collecting queries that are important for their RAG system; these queries can number in the tens, hundreds, or thousands.
5. A RAG connector collects information from a RAG pipeline, which may include vector-based, language chain, Lama Index, and other connectors.
6. Connectors generate RAG outputs, which are then evaluated using various metrics.
7. Metrics are grouped into evaluators, forming the internal architecture of Open RAG Eval.
8. Evaluators generate RAG evaluation files with all necessary information for assessing a RAG pipeline.
9. Open RAG Eval includes several metrics that do not require golden answers:
* Umbrella: A retrieval metric that scores a chunk or passage's relevance to a query (0-3 scale).
* Auto Nuggetizer: A generation metric that creates atomic units (nuggets) and assigns vitality ratings.
* Citation Faithfulness: Measures the fidelity of citations in the response.
* Hallucination Detection: Checks if the entire response aligns with the retrieved content.
10. Umbrella correlates well with human judgment, providing confidence in results even without golden chunks.
11. Auto Nuggetizer has three steps: nugget creation, vitality rating assignment, and LLM judge analysis to determine if each selected nugget is fully or partially supported.
12. Citation Faithfulness measures the support for citations (high fidelity, partial support, or no support).
13. Hallucination Detection uses Victara's hallucination detection model to ensure the response aligns with the retrieved content.
14. Open RAG Eval provides a user interface for visualizing evaluation results at openevaluation.ai.
15. The UI displays queries, retrieval scores, and generation scores in an easy-to-understand format.
16. Users can drag and drop RAG evaluation files onto the Open RAG Eval UI for visualization.
17. Open RAG Eval is a powerful tool for optimizing and tuning RAG pipelines.
18. The project is open source, allowing users to examine its inner workings and contribute connectors or other improvements.
19. Transparency is a key benefit of Open RAG Eval, as the metrics are clear and understandable.
20. Victara contributes connectors for vector-based, language chain, and Lama Index pipelines.
21. Users can contribute connectors for their own or other preferred RAG pipelines.
22. Open RAG Eval welcomes questions, issues, and pull requests related to connectors or other aspects of the project.
23. The presentation concludes by thanking the audience for their attention.
24. The topic was Open RAG Eval, an open-source package for optimizing and tuning RAG pipelines using transparent and scalable evaluation methods.

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!