How to run Mixtral LLM on your Laptop - January 26, 2024 - Exciting AI Updates

608 views· 29 likes· 30:27· Jan 26, 2024

ShareTwitter Facebook LinkedIn Instagram

🛍️ Products Mentioned (9)

How to run Mixtral LLM on your Laptop - January 26, 2024 - Exciting AI Updates Presented by Denis Mazur & Artyom Eliseev Slides - https://github.com/lselector/seminar/tree/master/2024 Run Mixtral on Nvidia 3060 with 12GB ! - https://arxiv.org/abs/2312.17238 - paper - https://github.com/dvmazur/mixtral-offloading - https://twitter.com/rohanpaul_ai/status/1741103866047869222 Very elegant work. Original Mixtral requires more than 90 GB of memory. Almost 97% of this size is taken by Feed-Forward Networks in transformer layers. Authors used multiple ways to decrease the memory requirements while keeping the accuracy of the model. Authors tested multiple quantization methods and selected a flexible quantization scheme where different parts of the network quantized differently. To achieve further decrease in GPU memory requirements, authors have implemented the dynamic loading/offloading of experts networks in transformer layers. They used "speculative" loading - trying to predict and load only parts of Feed-Forward Network experts networks (as needed). As a result they have demonstrated that you can run Mixtral on a modest laptop with Nvidia 3060 with only 12GB with decent (practical) performance. Denis Mazur - https://github.com/dvmazur - https://huggingface.co/dvmazur Artyom Eliseev - https://github.com/lavawolfiee - https://huggingface.co/lavawolfiee My websites: - Enterprise AI Solutions - https://EAIS.ai - Linkedin - https://www.linkedin.com/in/levselector - GitHub - https://github.com/lselector

Watch on YouTube