Exploring MR ALia's Open Source AI Models: From M 7B to C 22b

Welcome to our journey of open-sourcing AI models, where we're democratizing Frontier AI for developers and revolutionizing the way we train, fine-tune, and deploy language models.

  • 1. The speaker is excited to talk about MrALIA open models.
  • 2. MrALIA started last June and released its first open model M7B in September.
  • 3. In December, they released a mixture of experts open model 8X7B.
  • 4. They also released their platform with model APIs and commercial models MrMedium and MEmbed.
  • 5. In February, they released their Flagship model MrLarge, which has superior reasoning and math abilities.
  • 6. In April, they released a new open model 8X22b.
  • 7. In June, they released a code-specific model called C22b, available for free use on their chat interface.
  • 8. The mission of MrALIA is to bring frontier AI into everyone's hands, focusing on building cutting-edge AI for developers.
  • 9. They have principles behind training and releasing models, such as openness, portability, optimized performance, and customizability.
  • 10. Openness: they train best-in-class open models and release them to the open source community.
  • 11. Portability: all models are available on Azure, AWS, GCP, virtual private cloud, and can be deployed on-premises with full control over security and privacy.
  • 12. Optimized performance: they aim for optimal performance-to-size ratio in their models.
  • 13. Customizability: they provide libraries and tools to customize their models according to specific applications.
  • 14. MrALIA has open sourced three models in the past year, including a DeTransformer model (7B), a Sparse mixture of experts model (8X7B), and an even larger version of the latter (AX22b).
  • 15. The 7B model was the first to achieve 60 on MMLU, which is considered a bare minimum for useful models.
  • 16. The Sparse mixture of experts architecture allows pushing performance while keeping inference budget in check by using only a small subset of parameters for every token.
  • 17. MrALIA sees open source as complementary to profit, not competitive.
  • 18. Open sourcing their models serves as a branding and marketing tool, helping them create awareness about their products and acquire customers.
  • 19. Open source models are trained in three stages: pre-training, instruction tuning, and learning from human feedback.
  • 20. Pre-training involves predicting the next token in a sequence of text by training on massive datasets, which can require hundreds of trillions of tokens.
  • 21. Instruction tuning involves using prompt-response pairs to teach models how humans want to interact.
  • 22. Learning from human feedback allows for scaling data faster and optimizing model performance based on human preferences.
  • 23. The open source models by MrALIA, including CODR 22b, have been trained using these techniques.
  • 24. CODR 22b is a dense Transformer model specifically designed for code, fluent in + programming languages, and offering both code completion and question-answering capabilities.

Source: AI Engineer via YouTube

❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!