Exponential Growth in Large Language Models: Training Finance Experts with Domain-Specific Models & Extended Context
As Chief Scientistic Gradient, I'll dive into how we trained large language models to become Finance experts and discuss two key solutions: a domain-specific Finance language model and an extended context length model that addresses hallucinations.
- 1. Leo is the Chief Scientific Officer at Gradient.
- 2. He will discuss training large language models to be Finance experts.
- 3. Foundational models have been growing at an exponential rate.
- 4. Different companies have their own flavor of a language model, each with its own features and use cases.
- 5. Context length in language models has increased significantly over the past year.
- 6. Largest context length models were around 100k a year ago but have grown to about 40 times that now.
- 7. Large language models are not one-size-fits-all, especially for complicated use cases.
- 8. Generalist or base language models won't get you very far in more complex scenarios.
- 9. Gradient has built an AI Foundry, a collection of custom language models and workflow primitives.
- 10. These pieces are combined to create solutions tailored for their customers.
- 11. Today, Leo will focus on solutions for the finance domain - building Financial experts.
- 12. Two components have been particularly useful in building these solutions: a domain-specific Finance language model and context length extension.
- 13. Six requirements were identified for finance applications of language models that generalist models often lack or fall short on.
- 14. Domain knowledge is crucial because general purpose language models are trained on a broad set of data, but may not perform well with specific technical financial information.
- 15. Thousands of relevant documents related to the topic are needed in the model's pre-training for decent accuracy.
- 16. To train a finance-specific language model, an automated data pipeline is required due to the vast amount of financial data available.
- 17. Automated data curation involves borrowing ideas from membership inference literature to ensure the model hasn't already seen the data in its training data.
- 18. After filtering out seen data, human review and synthetic data augmentation are used to create a manageable dataset for training.
- 19. Continuous pre-training and running alignments (supervised fine-tuning and preference optimization) on the model are part of the training pipeline.
- 20. Pre-training is like reading textbooks, while alignment is instructing the model on best practices and using information.
- 21. Extended context length helps address hallucinations in language models.
- 22. Hallucinations occur when a model generates irrelevant or inconsistent content with input data.
- 23. Causes of hallucinations include outdated training data, automated data collection issues, and source reference divergence.
- 24. In-context learning is the most direct and sample-efficient way to reduce hallucinations by introducing small amounts of information into the prompt during inference time.
Source: AI Engineer via YouTube
❓ What do you think? What are your thoughts on the ideas shared in this video? Feel free to share your thoughts in the comments!