AI-Powered Code Editor, Zed: Empirical Approach to Stable Development
As co-founder of Zed, an AI-enabled code editor, Nathan Sobo shares the journey of building a reliable software system, from embracing empirical testing to driving out algorithmic problems with stochastic tests, and how rigorous testing became fundamental in delivering a robust product.
- 1. Nathan Sobo, co-founder of Zed, an AI-enabled code editor, is giving a talk about their approach to testing and delivering reliable AI features.
- 2. Zed is implemented from scratch in Rust, not a fork of VS Code.
- 3. The system is engineered like a video game with 1,200 lines of shader program running on the GPU, delivering frames at 120 frames per second.
- 4. Zed recently launched Agentic Editing and took an empirical approach to test it.
- 5. They have tens of thousands of tests and are hardcore about testing and being empirical.
- 6. Previously, they could be fully deterministic even with non-deterministic elements and never had flaky tests on CI.
- 7. However, the introduction of large language models (LLMs) changed this as small changes in input can lead to vastly different outputs.
- 8. Zed's first evaluation focused on a data-driven approach commonly seen in machine learning.
- 9. In programming, an eval is more like a test that passes or fails, rather than a datadriven input-output pair.
- 10. The team made their eval more programmatic by writing code that performs assertions about what the agent did.
- 11. They discovered and addressed specific failure modes, such as incorrect assumptions when running tools.
- 12. Zed's editing implementation was initially done with tool calls, but they moved to a small tool call describing edits and then looping back to the same model for efficiency.
- 13. The team ran stochastic tests in their main regular test suite, aiming for 100% of examples to pass or have the build fail.
- 14. They addressed non-deterministic issues like parsing and fuzzy matching through deterministic testing.
- 15. Specific behavior of the model, such as empty old text tags when inserting at the top or end of a document, required robust handling.
- 16. The model might mismatch XML tags, so they added a prompt to always close all tags properly and test for correctness.
- 17. Indentation issues were addressed by detecting indent delta and renormalizing text in the buffer and LLM output.
- 18. Weird escaping behavior required simple prompt fixes to handle constructs like Rust raw strings with quotes and special characters.
- 19. The team emphasizes that rigorous testing is fundamental to building reliable software, even when using AI capabilities.
- 20. Techniques from traditional software engineering remain applicable, but a more statistical approach is needed.
- 21. Many problems stem from the model's "stupid" actions, requiring specific countermeasures.
- 22. Zed's open-source under GPL license and welcomes contributions to improve it.
- 23. Nathan has found success in writing Rust code agentically with Claude 4 models and appreciates the efficiency of the process.
- 24. The talk emphasizes the importance of being empirical, leveraging existing software testing infrastructure, and applying traditional software engineering techniques when working with AI capabiliti
Source: AI Engineer via YouTube
❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!