Mac Mini for AI Agents: Complete Setup Guide (2026)
The Mac Mini is quietly becoming the default hardware for running AI agents at home. I've been using one as my always-on AI agent server for months, and it's genuinely changed my setup. This is the complete Mac Mini for AI agents setup guide — hardware specs, what to install, how to configure everything, and the exact workflow I use daily in 2026.
Why Mac Mini for AI Agents?
Three reasons the Mac Mini keeps coming up in every AI agent conversation:
- Unified memory. Apple Silicon shares memory between CPU and GPU. A 24GB Mac Mini gives your AI model access to all 24GB — no separate VRAM needed. On PC, you'd need an expensive GPU with dedicated VRAM.
- Power efficiency. The Mac Mini M4 idles at about 5-7 watts. Under AI workload, it pulls 30-60 watts. Compare that to a PC with a GPU pulling 200-400 watts. For an always-on agent server, electricity cost matters.
- Small and silent. It's literally the size of a sandwich. No fan noise under normal loads. Sits on a shelf and runs 24/7 without anyone noticing.
When I say "Mac Mini" in this guide, I mean the M4 generation (released late 2024). The M4 base and M4 Pro are both excellent. The difference matters, and I'll break it down.
Which Mac Mini to Buy
| Config | Price | RAM | Best For |
|---|---|---|---|
| Mac Mini M4 (base) | $599 | 16GB | Lightweight agents, 7B models, cloud API setup |
| Mac Mini M4 (upgraded) | $799 | 24GB | Best value — runs 14B models, handles most agent tasks |
| Mac Mini M4 Pro | $1,399 | 24GB | Faster inference, better for 32B+ models |
| Mac Mini M4 Pro (upgraded) | $1,799 | 48GB | Runs 70B models, serious home lab |
My recommendation: The $799 M4 with 24GB is the sweet spot. It runs 14B parameter models at usable speeds (15-25 tokens/second), costs less than a mid-range GPU, and handles everything except the largest models. This is what I tell everyone in our bootcamps to start with.
If you're planning to sell AI agent setups to businesses, the M4 Pro with 48GB is worth it — you can demo larger models and handle more concurrent agents.
Step 1: Initial Mac Mini Setup
Unbox it, plug it in, and run through Apple's setup. Then:
System Settings to Change
- Energy Saver → Prevent automatic sleeping: Turn this ON. Your agents need the machine awake 24/7.
- Energy Saver → Start up automatically after power failure: Turn ON. If power blips, you want it coming back.
- Sharing → Remote Login: Enable SSH so you can manage it headlessly from another machine.
- General → Software Update → Automatic Updates: Keep on but set to install overnight so updates don't kill your agents mid-task.
Install Homebrew
Open Terminal and run:
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"Homebrew is the package manager for macOS. You'll need it for everything else.
Install Essential Tools
brew install git node python3 tmux htopStep 2: Install Ollama for Local Models
Ollama is the easiest way to run local AI models. Download it from ollama.com or install via Homebrew:
brew install ollamaStart the Ollama service:
ollama servePull your first model. I recommend starting with Qwen 2.5 14B — it's excellent for agent worker tasks:
ollama pull qwen2.5:14bTest it works:
ollama run qwen2.5:14b "What is an AI agent in one sentence?"You should get a response in a few seconds. Ollama now serves as a local API at http://localhost:11434 — your agents can hit this endpoint just like they'd hit OpenAI's API.
Models I Recommend for Mac Mini
| Model | Size | RAM Needed | Use Case |
|---|---|---|---|
| Qwen 2.5 7B | 4.7GB | 8GB+ | Simple tasks, classification, formatting |
| Qwen 2.5 14B | 9.0GB | 16GB+ | Best all-rounder for agent workers |
| Llama 4 Scout | 18GB | 24GB+ | More capable reasoning, summarization |
| Mistral Small 3.1 | 15GB | 24GB+ | Good for code and structured output |
| Llama 4 Maverick (quantized) | 35GB | 48GB+ | Near-cloud quality (needs M4 Pro 48GB) |
On the 24GB Mac Mini, stick with models under 18GB to leave room for your OS and other applications.
Step 3: Set Up OpenClaw
OpenClaw is what ties everything together. It's the agent framework that manages your AI agents, their memory, their tools, and their communication channels.
Install OpenClaw:
npm install -g openclawInitialize your workspace:
openclaw initConfigure your API keys. You'll want at least one cloud API for your brain agent:
openclaw config set anthropic.apiKey YOUR_KEY_HERESet your local model as the default for worker tasks:
openclaw config set agents.defaults.localModel ollama/qwen2.5:14bStart the gateway (this is what keeps your agents alive 24/7):
openclaw gateway startYour Mac Mini is now running as an AI agent server. The gateway handles heartbeats, cron jobs, and keeps your agents persistent across sessions.
Step 4: Configure Your Agent Architecture
Here's the setup I recommend for a Mac Mini home lab. This matches the memory system guide architecture:
Brain Agent (Cloud)
- Model: Claude Opus or Sonnet (via API)
- Role: Planning, complex reasoning, writing, code generation
- Cost: $15-75 per million tokens
- This is your "smart" agent that makes decisions
Worker Agents (Local)
- Model: Qwen 2.5 14B via Ollama
- Role: Data processing, formatting, simple extraction, classification
- Cost: $0 (runs on your Mac Mini)
- These handle the volume work your brain agent delegates
Coding Agents (Cloud)
- Model: Claude Code or Codex via API
- Role: Writing and editing code, debugging, refactoring
- Cost: Varies by usage
- Local models can't match cloud models for serious code generation yet
This hybrid architecture means your Mac Mini handles 70-80% of agent requests locally (free), and only the complex tasks go to cloud APIs (paid). That's the cost optimization that makes running agents 24/7 affordable.
Step 5: Set Up Always-On Operation
Your Mac Mini should run agents without you babysitting it. Here's how:
Keep Ollama Running at Boot
Ollama installs as a macOS service automatically. Verify it's set to start at login:
launchctl list | grep ollamaKeep OpenClaw Gateway Running
The OpenClaw gateway persists across reboots. Check status:
openclaw gateway statusMonitor Resource Usage
Use htop or Activity Monitor to watch memory usage. On a 24GB Mac Mini running Qwen 2.5 14B:
- Model loaded: ~10GB RAM
- OpenClaw gateway: ~200MB RAM
- macOS overhead: ~4GB RAM
- Available headroom: ~10GB
That headroom is plenty for normal operation. If you start running multiple models simultaneously, watch it more closely.
Remote Access
Since SSH is enabled (Step 1), you can manage everything from your main computer:
ssh your-username@mac-mini-ipI use tmux for persistent terminal sessions on the Mac Mini. Start a tmux session, run your agents, detach, and they keep running:
tmux new -s agents
# ... start your agents ...
# Press Ctrl+B, then D to detach
Step 6: Optimize Performance
Model Quantization
Ollama models come in different quantization levels. Lower quantization = smaller size but slightly lower quality:
- Q8: Best quality, largest size — use if RAM allows
- Q6_K: Great balance — barely any quality loss
- Q4_K_M: Good for fitting larger models in limited RAM
- Q3: Noticeable quality drop — only if desperate for RAM
For the 24GB Mac Mini, use Q6_K or Q8 for 14B models. Quality matters for agent reliability.
Context Length Settings
By default, Ollama uses 2048 token context. For agent work, you'll want more:
ollama run qwen2.5:14b --num-ctx 8192Higher context uses more RAM. On 24GB, 8192-16384 tokens is a safe range for 14B models.
Real-World Performance Numbers
Here's what I actually see on my Mac Mini M4 with 24GB running Qwen 2.5 14B:
| Metric | Performance |
|---|---|
| Tokens per second (generation) | 18-25 tok/s |
| Time to first token | 0.3-0.8 seconds |
| Concurrent requests | 2-3 before slowdown |
| Power draw under load | 35-55 watts |
| Monthly electricity cost (24/7) | ~$3-5 |
| Uptime (last 30 days) | 99.8% |
18-25 tokens per second is fast enough for agent worker tasks. You're not waiting around. The brain agent on cloud API is actually the bottleneck in most workflows, not the local worker.
Common Mistakes to Avoid
- Don't try to run too large a model. A 70B model on 24GB RAM will crawl. Stick to models that fit comfortably with room to spare.
- Don't skip the brain agent. Local models are workers, not strategists. You still need a cloud API model for the thinking. See our Claude Code vs Codex comparison for brain agent options.
- Don't forget memory management. If Ollama loads a model and you also have Chrome with 50 tabs open, you'll run out of RAM. Dedicate the Mac Mini to agent work.
- Don't ignore updates. Ollama and model weights update frequently. New quantizations and model versions can give you 20-30% speed boosts for free.
What About Mac Studio or DGX Spark?
If you outgrow the Mac Mini:
- Mac Studio M4 Ultra (192GB): Runs the largest open models. Costs $5,000+. Overkill for most people, perfect if you're running a business serving multiple clients.
- NVIDIA DGX Spark: Purpose-built AI hardware. Incredible performance but $3,000+ and less flexible as a general computer.
- Multiple Mac Minis: Some people cluster two Mac Minis — one for the main agent, one as a dedicated model server. Works well and you can upgrade incrementally.
Start with one Mac Mini. Scale when you actually need to.
ALSO: Test Your Mac Mini's AI Performance in 5 Minutes
Already have a Mac Mini? Run this quick benchmark to see where you stand:
- Install Ollama if you haven't:
brew install ollama - Pull a benchmark model:
ollama pull qwen2.5:14b - Check the output for
eval rate— that's your tokens per second - Compare: 15+ tok/s = good, 20+ tok/s = great, 25+ tok/s = excellent
Run a timed generation:
time ollama run qwen2.5:14b "Write a 200-word summary of how AI agents work" --verboseIf you're below 15 tok/s, you might be running too large a model for your RAM. Drop down a size or use a more aggressive quantization.
FAQ
Can I use a regular Mac Mini (not Pro) for AI agents?
Yes. The base M4 Mac Mini with 24GB RAM runs 14B parameter models well. The M4 Pro gives you faster inference and more GPU cores, but the standard M4 is genuinely capable for a home AI agent setup. Start with the $799 model and upgrade later if needed.
How many AI agents can a Mac Mini run simultaneously?
It depends on the model size and workload. With a 14B model on 24GB RAM, you can comfortably run 2-3 concurrent agent requests. The OpenClaw gateway queues requests when the model is busy, so you won't crash — things just slow down. For heavy multi-agent setups, the M4 Pro with 48GB handles more concurrency.
Do I still need cloud APIs if I have a Mac Mini running local models?
Yes, for most setups. Local models excel at worker tasks (data processing, formatting, simple extraction), but cloud models like Claude Opus and GPT-4.1 are still meaningfully better at complex reasoning, long-context analysis, and code generation. The best setup is hybrid: local for volume, cloud for brains.
How much electricity does a Mac Mini use running AI agents 24/7?
A Mac Mini M4 pulls about 5-7 watts at idle and 35-55 watts under AI workload. Running 24/7 with moderate AI use, expect about 15-25 kWh per month — roughly $3-5 depending on your electricity rate. That's dramatically less than a PC with a dedicated GPU.
Setting up your first AI agent home lab? Join our free community at AI Creator Hub on Skool — we've got a whole channel dedicated to hardware setups and people sharing their Mac Mini configs.