Exploring MR ALIA's Open-Source AI Models: A Year in Review

Exploring the Open Models of MRAL AI: From Pre-Training to Fine-Tuning, Uncovering the Secrets Behind Our Cutting-Edge Language Models.

1. The speaker is excited to talk about MrAlia AI's open models.
2. MrAlia AI was started in June of the previous year.
3. They released their first open model, M7B, in September.
4. In December, they released their first mixture of experts open model, 8X7B.
5. Along with these models, they also released their platform with model APIs and commercial models.
6. In February of the current year, they released M Large, their flagship model.
7. In April, they released a new open model, 8X22b.
8. Most recently, in June, they released a code-specific model called C22b.
9. MrAlia AI's mission is to bring Frontier AI to everyone and specifically focus on building Cutting Edge AI for developers.
10. They have certain principles when training and releasing models: openness, portability, performance optimization, and customizability.
11. Their first model was a DeTransformer model, which was the first 7B model to achieve 60 on MML.
12. They released their first Sparse mixture of experts model in December, which is based on the mixture of experts architecture.
13. The Sparse mixture of experts model has a higher number of total parameters but uses only a small subset of the parameters for every token, making it fast and cost-efficient at inference time.
14. They released a bigger version of this architecture, ax22 be, in April, which has even better performance, a higher context window, and is multilingual.
15. The speaker explains that open sourcing their models serves as a branding and marketing tool for them.
16. Open source helps create awareness about their products, acquire customers, and learn from the community on how to customize or deploy their models in new settings.
17. Large language models (LLMs) are typically trained in three stages: pre-training, instruction tuning, and learning from human feedback.
18. Pre-training involves passing a piece of text through the model and predicting the next token.
19. The data sets used for pre-training are huge and require pre-processing, cleaning, deduplication, and curation.
20. Pre-training requires a lot of investment, as models can go up to hundreds or even thousands of billions of parameters.
21. Each model takes tens to hundreds of millions of dollars to train, and it's difficult to get the investment for another training run if something goes wrong.
22. The best hyperparameters for smaller models might not be the best for larger models.
23. In instruction tuning, prompt-response pairs are used instead of just a string of text.
24. Learning from human feedback is done by obtaining human preferences, which are cheaper or easier to obtain than full human annotation.

Source: AI Engineer via YouTube

❓ What do you think? What role do you think open-source models can play in democratizing access to cutting-edge AI technologies, and how can they ultimately benefit humanity? Feel free to share your thoughts in the comments!