Fixing Bugs in Llama 3: A Pre-Release Update

Join me as we dive into the world of open-source AI models, exploring common bugs and fixes, including those in Llama 3, and learn how to overcome challenges like double BOS tokens, incorrect model usage, and more.

  • 1. The speaker will discuss fixing bugs in open source models, specifically focusing on Llama 3 bugs.
  • 2. Slides from the previous day's workshop can be found at tinyurl.com/workshopslides and additional resources at tinyurl.com/unof.
  • 3. The speaker has experience fixing bugs in Google's open source model, Gemma.
  • 4. One common issue in fine-tuning language models is tokenization problems.
  • 5. There are eight bugs in Llama 3; some have not been announced yet.
  • 6. First bug: Do not use double Beginning of Sentence (BOS) tokens when fine-tuning, as it can reduce accuracy.
  • 7. Unsoft software can help fix issues with double BOS tokens.
  • 8. Second bug: Be cautious of memory usage and sequence length in models.
  • 9. Loading 4-bit models can save memory; memory usage should match the model's max sequence length.
  • 10. Unsoft supports fine-tuning various models, including Llama, Mistro, Gemma V3, and more.
  • 11. When fine-tuning, use powers of two for the rank, and avoid overly large ranks to prevent overfitting and excessive memory usage.
  • 12. Make sure to include target modules like qkv, down, up, and a\_gate in fine-tuning.
  • 13. Unsoft has a method called UNS slow for long context fine-tuning.
  • 14. Customizable chat templates are supported in Unsoft, allowing users to merge multiple columns into one for training.
  • 15. Users should be careful with chat template repetitions; two iterations are recommended for the Llama 3 chat template.
  • 16. Set a batch size of 2 and gradient accumulation of 4 for fine-tuning in Unsoft.
  • 17. The effective batch size is calculated as batch size times gradient accumulation steps (e.g., 2 x 4 = 8).
  • 18. Use a learning rate of 2e-4 or smaller for fine-tuning.
  • 19. After fine-tuning, use the apply chat template for inference, ensuring there are no double BOS tokens.
  • 20. Unsoft now supports saving multiple .ggw files for easier model management.
  • 21. The AMA chat template notebook can be found at tinyurl.com/unsoft2.
  • 22. Don't forget to join the speaker's Discord Channel for further questions and discussions about AI and bug fixes.

Source: AI Engineer via YouTube

❓ What do you think? What does the experience of fine-tuning AI models, like LLaMA 3, reveal about the complexities and challenges of artificial intelligence in our daily lives? Feel free to share your thoughts in the comments!