QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

100.3K views· 3,200 likes· 36:58· Feb 27, 2024

ShareTwitter Facebook LinkedIn Instagram

🛍️ Products Mentioned (10)

🤝 Want your team maximizing Claude? I run 1:1 and team AI workshops for companies doing $1M+ per year: https://aibuilder.academy/yt/XpoKB3usmKc In this video, I discuss fine-tuning an LLM using QLoRA (i.e. Quantized Low-rank Adaptation). Example code is provided for training a custom YouTube comment responder using Mistral-7b-Instruct. More Resources: ▶️ Series Playlist: https://www.youtube.com/playlist?list=PLz-ep5RbHosU2hnz5ejezwaYpdMutMVB0 🎥 Fine-tuning with OpenAI: https://youtu.be/4RAvJt3fWoI 📰 Read more: https://medium.com/towards-data-science/qlora-how-to-fine-tune-an-llm-on-a-single-gpu-4e44d6b5be32?sk=4dccc921ab3bd4adc90248293cb13740 💻 Colab: https://colab.research.google.com/drive/1AErkPgDderPW0dgE230OOjEysd0QV1sR?usp=sharing 💻 GitHub: https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/qlora 🤗 Model: https://huggingface.co/shawhin/shawgpt-ft 🤗 Dataset: https://huggingface.co/datasets/shawhin/shawgpt-youtube-comments [1] Fine-tuning LLMs: https://youtu.be/eC6Hd1hFvos [2] ZeRO paper: https://arxiv.org/abs/1910.02054 [3] QLoRA paper: https://arxiv.org/abs/2305.14314 [4] Phi-1 paper: https://arxiv.org/abs/2306.11644 [5] LoRA paper: https://arxiv.org/abs/2106.09685 Intro - 0:00 Fine-tuning (recap) - 0:45 LLMs are (computationally) expensive - 1:22 What is Quantization? - 4:49 4 Ingredients of QLoRA - 7:10 Ingredient 1: 4-bit NormalFloat - 7:28 Ingredient 2: Double Quantization - 9:54 Ingredient 3: Paged Optimizer - 13:45 Ingredient 4: LoRA - 15:40 Bringing it all together - 18:24 Example code: Fine-tuning Mistral-7b-Instruct for YT Comments - 20:35 What's Next? - 35:22

Watch on YouTube