Building Enterprise-Scale RAG Stack: Seven Pitfalls to Avoid When Building DIY

Welcome to my discussion on the hidden costs of building your own RAG stack, where I'll share seven pitfalls to avoid when deploying natural language processing technology at scale.

1. The speaker is Ofer, who works in developer relations at Vicara and has a background in machine learning and software engineering.
2. They will be discussing the hidden costs of building your own RAG (Retrieve, Answer, Generate) stack.
3. A RAG stack is a way to use language models (LMs) with your own data, rather than calling the LM directly. This is done by using a retrieval engine to find the most relevant facts from your data, a
4. Building an enterprise-scale RAG platform is much harder than it may seem, with many different components involved.
5. Vicara offers a RAG-as-a-service, which includes all of the components of a RAG stack and allows users to index and query data using external APIs.
6. There is a significant difference between building your own DIY RAG stack and using a platform like Vicara.
7. Some potential pitfalls of building your own RAG stack include:
* Quality of responses/hallucinations: It is important to invest in parsing, chunking, hybrid search, and other components to ensure high-quality results and address the issue of hallucinations.
* Latency: Multiple components in a RAG stack can lead to higher latency, which may not be immediately apparent during initial development.
* Scaling and cost: As the number of documents and users increases, so does the cost of GPUs, CPUs, and storage.
* Security and compliance: It is important to implement attribute-based access control and properly handle sensitive information in a RAG stack.
* Vendor chaos: Using multiple vendors for different components of a RAG stack can lead to integration and diagnoses issues.
* Unsustainable expertise: Building and maintaining a RAG stack requires a unique set of skills that may be difficult to find and retain.
* Non-English support: It is important to consider the language needs of your users when building a RAG stack.
8. Vicara's RAG-as-a-service includes all of the components of a RAG stack and allows users to upload and index data, as well as run queries and chat through the platform.
9. Vicara focuses on accuracy, good retrieval, security mechanisms, and observability in its RAG-as-a-service.
10. The company also has a hallucination evaluation model (HHM) to help users understand if the LM is doing a good job and reduce hallucinations.
11. Vicara's HHM is open source and has been downloaded over 3 million times on Hugging Face.
12. The company also maintains a leaderboard of LMs and their likelihood to hallucinate, which can be helpful in choosing the right model for your needs.
13. Vicara's RAG-as-a-service is also available as a self-hosted option, with many customers requiring on-premises deployment.
14. The speaker encourages listeners to try Vicara and offers a QR code for a 30-day free trial with full capabilities.
15. They also invite listeners to reach out if they are interested in learning more about Vicara.

Source: AI Engineer via YouTube

❓ What do you think? What are the most significant considerations for building an enterprise-scale Rag Stack, and how do these challenges impact the quality of responses and overall efficiency? Feel free to share your thoughts in the comments!