Unpacking AI Planning: Overcoming Instruction Limitations of Language Models

Join me as we explore the limitations of language models in following instructions, and discover how AI agents with planning capabilities can overcome these challenges and achieve better results.

1. OpenAI released "Instruct GPT" in 2022, a model capable of following instructions and completing tasks according to user intent.
2. Despite this, language models (LMs) still struggle with following instructions effectively in 2025.
3. This is due to the increasing complexity of user prompts, which now include more information, context, constraints, and requirements.
4. For even simple tasks, LMs alone are no longer sufficient for instruction following, leading to the need for AI agents that employ planning.
5. An AI agent is a system that follows instructions by executing actions and making decisions based on its goals.
6. The definition of an AI agent can vary, with some considering large language models (LLMs) as agents or routers that direct queries to specialized LLMs.
7. Function calling is another aspect of AI agents, where external tools are provided for the LM to interact with other APIs and services.
8. React, act-to-repeat-and-observe, and thought-action-frameworks are popular agent implementations that can be used with any LLM.
9. Planning is the process of determining the steps needed to reach a goal, especially in complex tasks requiring parallelization and explanability.
10. Planning is essential when straightforward solutions aren't available, and it enables better understanding, control, and monitoring of an agent's decision-making process.
11. Form-based planners like text-based or code-based systems are common, while dynamic planning allows for replanning during execution.
12. Efficient planning involves using smart execution engines that analyze dependencies between steps, enabling parallel execution.
13. Trade-offs between speed and cost can be managed through techniques such as branch prediction for faster systems.
14. AI21 MASTRO is an example of a system that combines planning and smart execution engines to improve instruction following and requirement satisfaction.
15. MASTRO separates context, task, and requirements from the prompt, making it easier to validate and choose suitable candidates in each step.
16. Execution trees or graphs are used by the planner and execution engine to select and refine promising options based on predefined budgets.
17. Validation through iterations and improvements is essential for ensuring high-quality results from LLMs combined with planning and smart execution engines.
18. MASTRO has demonstrated improved performance in popular LMs like GPT-40, Cloud Sonet 3.5, and 3 Mini.
19. While using LLMs alone may be sufficient for some tasks, incorporating tools, React, or planning and execution engines can enhance performance and requirement satisfaction.
20. When faced with complex tasks, consider using planning and execution engines to achieve better results.
21. AI21 invites users to join the MASTRO waitlist to explore its capabilities and determine if it's suitable for their needs.
22. LLMs alone may not always be enough for even simple tasks like instruction following, as they can struggle with complex prompts.
23. Starting with simple solutions, such as using SLMs or incorporating tools into a model, is recommended before moving on to more complex methods like planning and execution engines.

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!