Exploring MR ALia's Open Source AI Models: From M 7B to C 22b
Welcome to our journey of open-sourcing AI models, where we're democratizing Frontier AI for developers and revolutionizing the way we train, fine-tune, and deploy language models.
- 1. The speaker is excited to talk about MrALIA open models.
- 2. MrALIA started last June and released its first open model M7B in September.
- 3. In December, they released a mixture of experts open model 8X7B.
- 4. They also released their platform with model APIs and commercial models MrMedium and MEmbed.
- 5. In February, they released their Flagship model MrLarge, which has superior reasoning and math abilities.
- 6. In April, they released a new open model 8X22b.
- 7. In June, they released a code-specific model called C22b, available for free use on their chat interface.
- 8. The mission of MrALIA is to bring frontier AI into everyone's hands, focusing on building cutting-edge AI for developers.
- 9. They have principles behind training and releasing models, such as openness, portability, optimized performance, and customizability.
- 10. Openness: they train best-in-class open models and release them to the open source community.
- 11. Portability: all models are available on Azure, AWS, GCP, virtual private cloud, and can be deployed on-premises with full control over security and privacy.
- 12. Optimized performance: they aim for optimal performance-to-size ratio in their models.
- 13. Customizability: they provide libraries and tools to customize their models according to specific applications.
- 14. MrALIA has open sourced three models in the past year, including a DeTransformer model (7B), a Sparse mixture of experts model (8X7B), and an even larger version of the latter (AX22b).
- 15. The 7B model was the first to achieve 60 on MMLU, which is considered a bare minimum for useful models.
- 16. The Sparse mixture of experts architecture allows pushing performance while keeping inference budget in check by using only a small subset of parameters for every token.
- 17. MrALIA sees open source as complementary to profit, not competitive.
- 18. Open sourcing their models serves as a branding and marketing tool, helping them create awareness about their products and acquire customers.
- 19. Open source models are trained in three stages: pre-training, instruction tuning, and learning from human feedback.
- 20. Pre-training involves predicting the next token in a sequence of text by training on massive datasets, which can require hundreds of trillions of tokens.
- 21. Instruction tuning involves using prompt-response pairs to teach models how humans want to interact.
- 22. Learning from human feedback allows for scaling data faster and optimizing model performance based on human preferences.
- 23. The open source models by MrALIA, including CODR 22b, have been trained using these techniques.
- 24. CODR 22b is a dense Transformer model specifically designed for code, fluent in + programming languages, and offering both code completion and question-answering capabilities.
Source: AI Engineer via YouTube
❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!