Build Cross-Platform Applications with Local AI: Why and How with Microsoft Foundry

Hi, I'm Ima, Program Manager at Microsoft, and today I'm excited to talk about Foundry Local, an end-to-end AI inference solution that enables developers to easily build cross-platform applications powered by local AI.

  • 1. Ima, a program manager at Microsoft, is discussing Foundry Local, which enables developers to build cross-platform applications with local AI.
  • 2. Four key reasons for using local AI instead of cloud AI:
  • a. Low network bandwidth or offline access
  • b. Privacy and security concerns
  • d. Real-time latency requirements
  • 3. Local AI is now a reality due to powerful computing hardware and optimization techniques in recent decades.
  • 4. Microsoft has several assets that can be used for Foundry Local:
  • a. Azure AI Foundry with 70,000+ organizations and 1,900+ models
  • b. Onyx runtime, a cross-platform high-performance on-device inference engine, with over 10 million downloads per month
  • c. The scale and reach of Windows on client devices
  • 5. Foundry Local includes:
  • a. Onyx runtime for performance acceleration across various hardware
  • b. A new Foundry Local management service for hosting and managing models on client devices
  • c. Connecting to Azure AI Foundry for downloading open-source models on demand
  • d. Foundry Local CLI for exploring models on the device
  • e. SDKs for easily integrating Foundry Local into applications
  • 6. Foundry Local was announced at the Microsoft Builder Conference and is available on Windows and Mac OS.
  • 7. Over 100 customers have joined the private preview of Foundry Local, providing positive feedback on its ease-of-use and performance.
  • 8. Local AI enables offline first AI applications, which are essential for sensitive data processing in restricted environments.
  • 9. Foundry Local supports popular generative AI models with variants optimized for different hardware, such as CPUs, CUDA, and integrated GPUs.
  • 10. The 2.5 billion model can process around 90 tokens per second with robos mode enabled.
  • 11. The 54 mini model provides more detailed information than the Cuban model but has a larger model size.
  • 12. Foundry Local is useful for building cross-platform AI applications that run directly on devices, providing high-level summaries of internal projects.
  • 13. Foundry Local offers Python and JavaScript SDKs for integration into applications.
  • 14. The agent feature in Foundry Local is still in private preview but allows users to create, build, and run local agents using local models and MCP servers.
  • 15. An agent consists of one model and one or more MCP servers based on user needs.
  • 16. Example agents include an OCR agent that extracts text from images and a speech-to-text service running locally.
  • 17. Foundry Local can provide tools related to file system management and OCR in an agent.
  • 18. Users can run queries with agents to perform specific tasks, such as finding and processing a receipt to get the total amount.
  • 19. Foundry Local has unlocked significant potential for local AI applications, but users should not expect it to perform at the same level as cloud models or agents in every aspect.
  • 20. More information on Foundry Local can be found via the provided link, and interested users can sign up for the private preview of the agent feature.

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!