Fixing Bugs in Llama 3: A Pre-Release Update
Join me as we dive into the world of open-source AI models, exploring common bugs and fixes, including those in Llama 3, and learn how to overcome challenges like double BOS tokens, incorrect model usage, and more.
- 1. The speaker will discuss fixing bugs in open source models, specifically focusing on Llama 3 bugs.
- 2. Slides from the previous day's workshop can be found at tinyurl.com/workshopslides and additional resources at tinyurl.com/unof.
- 3. The speaker has experience fixing bugs in Google's open source model, Gemma.
- 4. One common issue in fine-tuning language models is tokenization problems.
- 5. There are eight bugs in Llama 3; some have not been announced yet.
- 6. First bug: Do not use double Beginning of Sentence (BOS) tokens when fine-tuning, as it can reduce accuracy.
- 7. Unsoft software can help fix issues with double BOS tokens.
- 8. Second bug: Be cautious of memory usage and sequence length in models.
- 9. Loading 4-bit models can save memory; memory usage should match the model's max sequence length.
- 10. Unsoft supports fine-tuning various models, including Llama, Mistro, Gemma V3, and more.
- 11. When fine-tuning, use powers of two for the rank, and avoid overly large ranks to prevent overfitting and excessive memory usage.
- 12. Make sure to include target modules like qkv, down, up, and a\_gate in fine-tuning.
- 13. Unsoft has a method called UNS slow for long context fine-tuning.
- 14. Customizable chat templates are supported in Unsoft, allowing users to merge multiple columns into one for training.
- 15. Users should be careful with chat template repetitions; two iterations are recommended for the Llama 3 chat template.
- 16. Set a batch size of 2 and gradient accumulation of 4 for fine-tuning in Unsoft.
- 17. The effective batch size is calculated as batch size times gradient accumulation steps (e.g., 2 x 4 = 8).
- 18. Use a learning rate of 2e-4 or smaller for fine-tuning.
- 19. After fine-tuning, use the apply chat template for inference, ensuring there are no double BOS tokens.
- 20. Unsoft now supports saving multiple .ggw files for easier model management.
- 21. The AMA chat template notebook can be found at tinyurl.com/unsoft2.
- 22. Don't forget to join the speaker's Discord Channel for further questions and discussions about AI and bug fixes.
Source: AI Engineer via YouTube
❓ What do you think? What does the experience of fine-tuning AI models, like LLaMA 3, reveal about the complexities and challenges of artificial intelligence in our daily lives? Feel free to share your thoughts in the comments!