Vigyata.AI
Is this your channel?

Run a Local LLM: Ollama + Home Assistant

81.0K views· 1,983 likes· 16:42· Jul 30, 2025

🛍️ Products Mentioned (2)

To learn for free on Brilliant, go to https://brilliant.org/StratoBuilds/. You’ll also get 20% off an annual premium subscription. In this video, I walk you through my complete local LLM setup using Ollama on a Mac Mini M4 Pro, integrated with Home Assistant for fast, private, offline voice control. – Install and run Ollama on Mac – Integrate Ollama with Home Assistant – Cache Home Assistant responses for lightning-fast interactions – See my current favorite models for running Home Assistant voice – Visualize LLM activity with WLED – Run a local ChatGPT-style LLM chat-arena with Open WebUI 👉 Full write-up and additional resources: https://stratobuilds.com/project/local-llm-ollama-home-assistant/ Chapters: 00:00 - Intro 00:51 - Hardware & Software Overview 02:10 - Home Assistant + Local LLM Voice Demo 02:40 - Caching LLM Responses 03:58 - WLED LLM Activity Animations 05:05 - Installing Ollama on macOS 08:07 - Model Testing on Mac Mini M4 Pro 11:24 - Integrating Ollama with Home Assistant 12:48 - Brilliant Sponsorship Message 13:49 - Open WebUI (Local ChatGPT) Overview 15:33 - Final Thoughts & Wrap-Up This video was sponsored by Brilliant.

About This Video

By far the most common question I get whenever I talk Home Assistant Voice or LLMs is: “Are you running it locally?” In this video, I can finally say yes—and I walk through my full privacy-first setup: Ollama running on a Mac Mini M4 Pro, integrated into Home Assistant for fast, offline voice control. I show a real-time voice demo (no cloud, no third-party processing), then break down why Apple Silicon is such a compelling local-LLM box thanks to unified memory, and how I’m thinking about model choice as a balance between speed and accuracy. The big takeaway is my “cached responses” strategy. Instead of exposing hundreds of entities and making the model reason from scratch every time, I lean on Home Assistant scripts and predefined pathways. A larger model generates clean summaries (like an hourly spoken weather report), I cache that output, and then a lightweight model can serve it instantly for real-time voice. I also wired up a WLED “SOAP” animation that speeds up based on the Mac Mini’s power draw, so I can literally see when the LLM is working. Finally, I give a quick look at Open WebUI—my local ChatGPT-style interface for testing models side-by-side and running everything outside of Home Assistant.

Frequently Asked Questions

🎬 More from StratoBuilds