— bash — netgod.dev — 80×24
guest@netgod.dev:~/blog$ cat running-llms-locally-with-ollama.md
← cd ../blog
POST(AI)netgod.dev manualPOST(AI)
NAME

$ Running LLMs Locally with Ollama: A Practical Workflow

DESCRIPTION

Local LLMs went from "toy" to "genuinely useful" in 2024. Here's the setup I use daily for prototyping without burning OpenAI credits.

DATE
2025-03-22
DURATION
1 min read
TAGS
./assets/running-llms-locally-with-ollama.pngcover
CONTENT

I run a local LLM almost every day now. Not for production — for the loop where I'd otherwise spam GPT-4 with throwaway prompts while iterating.

The setup

Install Ollama (brew install ollama), pull a model (ollama pull llama3.1:8b), and you're chatting in your terminal in two minutes.

The model I actually use

For coding: qwen2.5-coder:7b. Surprisingly close to GPT-4o-mini for autocomplete and small refactors. Runs on a laptop.

For general chat: llama3.1:8b or mistral-nemo:12b if your machine can handle it.

Wire it into your editor

Ollama exposes an OpenAI-compatible HTTP API on localhost:11434. That means Continue, Cline, Aider, and friends all work with one config line.

The honest limits

Local 8B models are not GPT-4. They forget context, refuse weird things, and make up library APIs. But they're free, private, and fast enough for the prototype-and-iterate loop.

netgod.dev manual2025-03-22END