AI-Powered Code Editor, Zed: Empirical Approach to Stable Development

As co-founder of Zed, an AI-enabled code editor, Nathan Sobo shares the journey of building a reliable software system, from embracing empirical testing to driving out algorithmic problems with stochastic tests, and how rigorous testing became fundamental in delivering a robust product.

1. Nathan Sobo, co-founder of Zed, an AI-enabled code editor, is giving a talk about their approach to testing and delivering reliable AI features.
2. Zed is implemented from scratch in Rust, not a fork of VS Code.
3. The system is engineered like a video game with 1,200 lines of shader program running on the GPU, delivering frames at 120 frames per second.
4. Zed recently launched Agentic Editing and took an empirical approach to test it.
5. They have tens of thousands of tests and are hardcore about testing and being empirical.
6. Previously, they could be fully deterministic even with non-deterministic elements and never had flaky tests on CI.
7. However, the introduction of large language models (LLMs) changed this as small changes in input can lead to vastly different outputs.
8. Zed's first evaluation focused on a data-driven approach commonly seen in machine learning.
9. In programming, an eval is more like a test that passes or fails, rather than a datadriven input-output pair.
10. The team made their eval more programmatic by writing code that performs assertions about what the agent did.
11. They discovered and addressed specific failure modes, such as incorrect assumptions when running tools.
12. Zed's editing implementation was initially done with tool calls, but they moved to a small tool call describing edits and then looping back to the same model for efficiency.
13. The team ran stochastic tests in their main regular test suite, aiming for 100% of examples to pass or have the build fail.
14. They addressed non-deterministic issues like parsing and fuzzy matching through deterministic testing.
15. Specific behavior of the model, such as empty old text tags when inserting at the top or end of a document, required robust handling.
16. The model might mismatch XML tags, so they added a prompt to always close all tags properly and test for correctness.
17. Indentation issues were addressed by detecting indent delta and renormalizing text in the buffer and LLM output.
18. Weird escaping behavior required simple prompt fixes to handle constructs like Rust raw strings with quotes and special characters.
19. The team emphasizes that rigorous testing is fundamental to building reliable software, even when using AI capabilities.
20. Techniques from traditional software engineering remain applicable, but a more statistical approach is needed.
21. Many problems stem from the model's "stupid" actions, requiring specific countermeasures.
22. Zed's open-source under GPL license and welcomes contributions to improve it.
23. Nathan has found success in writing Rust code agentically with Claude 4 models and appreciates the efficiency of the process.
24. The talk emphasizes the importance of being empirical, leveraging existing software testing infrastructure, and applying traditional software engineering techniques when working with AI capabiliti

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!