AI-Powered Code Editor, Zed: Empirical Approach to Stable Development

As co-founder of Zed, an AI-enabled code editor, Nathan Sobo shares the journey of building a reliable software system, from embracing empirical testing to driving out algorithmic problems with stochastic tests, and how rigorous testing became fundamental in delivering a robust product.

  • 1. Nathan Sobo, co-founder of Zed, an AI-enabled code editor, is giving a talk about their approach to testing and delivering reliable AI features.
  • 2. Zed is implemented from scratch in Rust, not a fork of VS Code.
  • 3. The system is engineered like a video game with 1,200 lines of shader program running on the GPU, delivering frames at 120 frames per second.
  • 4. Zed recently launched Agentic Editing and took an empirical approach to test it.
  • 5. They have tens of thousands of tests and are hardcore about testing and being empirical.
  • 6. Previously, they could be fully deterministic even with non-deterministic elements and never had flaky tests on CI.
  • 7. However, the introduction of large language models (LLMs) changed this as small changes in input can lead to vastly different outputs.
  • 8. Zed's first evaluation focused on a data-driven approach commonly seen in machine learning.
  • 9. In programming, an eval is more like a test that passes or fails, rather than a datadriven input-output pair.
  • 10. The team made their eval more programmatic by writing code that performs assertions about what the agent did.
  • 11. They discovered and addressed specific failure modes, such as incorrect assumptions when running tools.
  • 12. Zed's editing implementation was initially done with tool calls, but they moved to a small tool call describing edits and then looping back to the same model for efficiency.
  • 13. The team ran stochastic tests in their main regular test suite, aiming for 100% of examples to pass or have the build fail.
  • 14. They addressed non-deterministic issues like parsing and fuzzy matching through deterministic testing.
  • 15. Specific behavior of the model, such as empty old text tags when inserting at the top or end of a document, required robust handling.
  • 16. The model might mismatch XML tags, so they added a prompt to always close all tags properly and test for correctness.
  • 17. Indentation issues were addressed by detecting indent delta and renormalizing text in the buffer and LLM output.
  • 18. Weird escaping behavior required simple prompt fixes to handle constructs like Rust raw strings with quotes and special characters.
  • 19. The team emphasizes that rigorous testing is fundamental to building reliable software, even when using AI capabilities.
  • 20. Techniques from traditional software engineering remain applicable, but a more statistical approach is needed.
  • 21. Many problems stem from the model's "stupid" actions, requiring specific countermeasures.
  • 22. Zed's open-source under GPL license and welcomes contributions to improve it.
  • 23. Nathan has found success in writing Rust code agentically with Claude 4 models and appreciates the efficiency of the process.
  • 24. The talk emphasizes the importance of being empirical, leveraging existing software testing infrastructure, and applying traditional software engineering techniques when working with AI capabiliti

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!