Fixing Bugs in Llama 3: A Pre-Release Update

Join me as we dive into the world of open-source AI models, exploring common bugs and fixes, including those in Llama 3, and learn how to overcome challenges like double BOS tokens, incorrect model usage, and more.

1. The speaker will discuss fixing bugs in open source models, specifically focusing on Llama 3 bugs.
2. Slides from the previous day's workshop can be found at tinyurl.com/workshopslides and additional resources at tinyurl.com/unof.
3. The speaker has experience fixing bugs in Google's open source model, Gemma.
4. One common issue in fine-tuning language models is tokenization problems.
5. There are eight bugs in Llama 3; some have not been announced yet.
6. First bug: Do not use double Beginning of Sentence (BOS) tokens when fine-tuning, as it can reduce accuracy.
7. Unsoft software can help fix issues with double BOS tokens.
8. Second bug: Be cautious of memory usage and sequence length in models.
9. Loading 4-bit models can save memory; memory usage should match the model's max sequence length.
10. Unsoft supports fine-tuning various models, including Llama, Mistro, Gemma V3, and more.
11. When fine-tuning, use powers of two for the rank, and avoid overly large ranks to prevent overfitting and excessive memory usage.
12. Make sure to include target modules like qkv, down, up, and a\_gate in fine-tuning.
13. Unsoft has a method called UNS slow for long context fine-tuning.
14. Customizable chat templates are supported in Unsoft, allowing users to merge multiple columns into one for training.
15. Users should be careful with chat template repetitions; two iterations are recommended for the Llama 3 chat template.
16. Set a batch size of 2 and gradient accumulation of 4 for fine-tuning in Unsoft.
17. The effective batch size is calculated as batch size times gradient accumulation steps (e.g., 2 x 4 = 8).
18. Use a learning rate of 2e-4 or smaller for fine-tuning.
19. After fine-tuning, use the apply chat template for inference, ensuring there are no double BOS tokens.
20. Unsoft now supports saving multiple .ggw files for easier model management.
21. The AMA chat template notebook can be found at tinyurl.com/unsoft2.
22. Don't forget to join the speaker's Discord Channel for further questions and discussions about AI and bug fixes.

Source: AI Engineer via YouTube

❓ What do you think? What does the experience of fine-tuning AI models, like LLaMA 3, reveal about the complexities and challenges of artificial intelligence in our daily lives? Feel free to share your thoughts in the comments!