Exploring MR ALIA's Open-Source AI Models: A Year in Review

Exploring the Open Models of MRAL AI: From Pre-Training to Fine-Tuning, Uncovering the Secrets Behind Our Cutting-Edge Language Models.

  • 1. The speaker is excited to talk about MrAlia AI's open models.
  • 2. MrAlia AI was started in June of the previous year.
  • 3. They released their first open model, M7B, in September.
  • 4. In December, they released their first mixture of experts open model, 8X7B.
  • 5. Along with these models, they also released their platform with model APIs and commercial models.
  • 6. In February of the current year, they released M Large, their flagship model.
  • 7. In April, they released a new open model, 8X22b.
  • 8. Most recently, in June, they released a code-specific model called C22b.
  • 9. MrAlia AI's mission is to bring Frontier AI to everyone and specifically focus on building Cutting Edge AI for developers.
  • 10. They have certain principles when training and releasing models: openness, portability, performance optimization, and customizability.
  • 11. Their first model was a DeTransformer model, which was the first 7B model to achieve 60 on MML.
  • 12. They released their first Sparse mixture of experts model in December, which is based on the mixture of experts architecture.
  • 13. The Sparse mixture of experts model has a higher number of total parameters but uses only a small subset of the parameters for every token, making it fast and cost-efficient at inference time.
  • 14. They released a bigger version of this architecture, ax22 be, in April, which has even better performance, a higher context window, and is multilingual.
  • 15. The speaker explains that open sourcing their models serves as a branding and marketing tool for them.
  • 16. Open source helps create awareness about their products, acquire customers, and learn from the community on how to customize or deploy their models in new settings.
  • 17. Large language models (LLMs) are typically trained in three stages: pre-training, instruction tuning, and learning from human feedback.
  • 18. Pre-training involves passing a piece of text through the model and predicting the next token.
  • 19. The data sets used for pre-training are huge and require pre-processing, cleaning, deduplication, and curation.
  • 20. Pre-training requires a lot of investment, as models can go up to hundreds or even thousands of billions of parameters.
  • 21. Each model takes tens to hundreds of millions of dollars to train, and it's difficult to get the investment for another training run if something goes wrong.
  • 22. The best hyperparameters for smaller models might not be the best for larger models.
  • 23. In instruction tuning, prompt-response pairs are used instead of just a string of text.
  • 24. Learning from human feedback is done by obtaining human preferences, which are cheaper or easier to obtain than full human annotation.

Source: AI Engineer via YouTube

❓ What do you think? What role do you think open-source models can play in democratizing access to cutting-edge AI technologies, and how can they ultimately benefit humanity? Feel free to share your thoughts in the comments!