Llama3 speed test on Dual Nvidia 3090

1.8K views· 14 likes· 3:56· May 13, 2024

ShareTwitter Facebook LinkedIn Instagram

🛍️ Products Mentioned (3)

Llama3 speed test on Linux PC with Two Nvidia RTX 3090 with 24GB - 48GB total. Presented by Lev Selector - May 13, 2024 Slides - https://github.com/lselector/seminar/tree/master/2024 --------- My websites: - Enterprise AI Solutions - https://EAIS.ai - Linkedin - https://www.linkedin.com/in/levselector - GitHub - https://github.com/lselector --------- Contents of today's video: We have tested llama3 (70b and 8b) on a desktop with two Nvidia RTX 3090 24GB video cards. - CPU: AMD Ryzen™ 9 3900X (12 cores, 24 threads) - RAM: 32GB - WSL2 (Windows Subsystem for Linux). - We used ollama to run llama3 8b and 70b - Actual models were: llama3:latest, llama3:70b, lama3:70b-instruct-q4_K_M - We also compared with performance on Apple Macbook Pro Max M3 128GB The prompt was: Please make a numbered chronological list of the last ten (10) US presidents in reverse order. The list should start like this: 1. Joe Biden (2021-present); 2. Donald Trump (2017-2021); 3. Barack Obama (2009-2017); the list should contain 10 rows. Important - make a fresh list. Disregard the chat history. Ouput only the list itself, nothing else. Output each lsit element on a separate line. Results below show total duration of the response (in seconds) and output speed (in tokens/s). We compare the Linux (Windows WSL) with latest MacBook Pro Max M3 128GB 70b Models: Linux. 9.7 s, 14 t/s Mac 16.5 s, 8.6 t/s 8b Models: both were 2.6 sec, 56 t/s

Watch on YouTube