Exploring New Horizons: LM Memory with Structure in AI Applications and Graphite for Code Translation

Join Jonathan Larson, leader of the Graph team at Microsoft Research, as he explores the power of LLM memory with structure and its applications in building effective AI applications.

1. Jonathan Larson from Microsoft Research runs the graph team and presented on new horizons in AI applications.
2. He emphasized that LM (large language models) with structured memory are key enablers for effective AI applications.
3. Larson also mentioned that agents paired with these structures can provide something even more powerful.
4. The presentation focuses on two main topics: graph applied to the coding domain and a new release called Benchmark QED.
5. Graph for code helps drive repository-level understanding in the coding space.
6. Larson showed a demonstration of a terminal-based video game analyzed by a regular RAG (retrieval-augmented generation) system and Graphite for code.
7. The regular RAG system provided a poor description, while Graphite for code offered a detailed and accurate one.
8. Graphite for code excels at global queries, which require understanding the whole repository to answer correctly.
9. Larson also demonstrated code translation using Graphite for code, translating Python code into Rust with better results than without using the tool.
10. They applied Graphite for code to a larger codebase (Doom) and found it effective for generating documentation and answering questions about modules in their entirety.
11. Using graph structures, they added a jump capability to the Doom video game by modifying multiple files, which other AI systems couldn't accomplish effectively.
12. Larson introduced Benchmark QED, an open-source tool on GitHub for measuring and evaluating systems like Graphite for code.
13. Benchmark QED consists of three components: Auto Q (query generation), autoe (evaluation using LLM as a judge), and autod (data set summarization and sampling).
14. Auto Q generates local and global queries based on data-driven or persona/activity-driven questions, covering various aspects of the dataset.
15. The evaluation platform, autoe, provides a composite score for comprehensiveness, diversity, empowerment, and relevance.
16. Lazy Graph outperformed vector RAG in data local questions, showing dominant performance across the entire span of questions, regardless of context window size.
17. Lazy Graph is now being lined up for launch in Azure and Microsoft Discovery platforms.
18. Larson concluded that LM memory with structured memory is a powerful tool and agents can significantly enhance their capabilities.

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!