$ Running LLMs Locally with Ollama: A Practical Workflow
Local LLMs went from "toy" to "genuinely useful" in 2024. Here's the setup I use daily for prototyping without burning OpenAI credits.
I run a local LLM almost every day now. Not for production — for the loop where I'd otherwise spam GPT-4 with throwaway prompts while iterating.
The setup
Install Ollama (brew install ollama), pull a model (ollama pull llama3.1:8b), and you're chatting in your terminal in two minutes.
The model I actually use
For coding: qwen2.5-coder:7b. Surprisingly close to GPT-4o-mini for autocomplete and small refactors. Runs on a laptop.
For general chat: llama3.1:8b or mistral-nemo:12b if your machine can handle it.
Wire it into your editor
Ollama exposes an OpenAI-compatible HTTP API on localhost:11434. That means Continue, Cline, Aider, and friends all work with one config line.
The honest limits
Local 8B models are not GPT-4. They forget context, refuse weird things, and make up library APIs. But they're free, private, and fast enough for the prototype-and-iterate loop.