Why RAM requirements vary
Local AI models are not normal desktop apps. A browser, spreadsheet, or photo editor uses memory based on the document and features you open. A local language model also needs to load large model weights and keep prompt context available while generating an answer.
That is why two people with "16 GB RAM" can have very different experiences. One person may run a small 3B model with short prompts and no other heavy apps. Another may try a 14B model, a long document, a browser with 40 tabs, Docker, Open WebUI, and a coding assistant. The RAM number is the same, but the workload is not.
The safest way to think about memory is in layers:
- The operating system needs memory.
- The AI app needs memory.
- The model needs memory.
- The prompt and context need memory.
- Documents, embeddings, tools, and browser interfaces may need more memory.
- A GPU may reduce pressure on system RAM only if the model and context fit well in VRAM.
When memory is tight, local AI may still run, but it can become slow, unstable, or unpleasant.
RAM, VRAM, and unified memory are different
System RAM
System RAM is the normal memory installed in a Windows or Linux PC. It supports the operating system, apps, background tasks, browser tabs, Docker, and any model work that is not fully handled by a GPU.
If you run local AI without a dedicated GPU, system RAM becomes the main memory pool for model loading and inference. That is possible, but usually slower than GPU acceleration.
GPU VRAM
VRAM is memory on a dedicated graphics card. For local language models, VRAM matters because it can hold model layers and speed up generation. More VRAM usually gives you more room for larger models, larger context, and smoother performance.
VRAM is not a complete replacement for system RAM. Your operating system and apps still need normal RAM. If the model does not fit in VRAM, part of the work may spill into system RAM, which usually slows down generation.
Apple unified memory
Apple Silicon Macs use unified memory, which is shared across the CPU, GPU, and other parts of the chip. That can be efficient for local AI because the GPU can access the same memory pool rather than relying on a separate dedicated VRAM amount.
The tradeoff is that unified memory is still shared with the operating system and every other app. A 16 GB Mac does not give a local model the full 16 GB. A 32 GB or 64 GB Mac gives much more practical room for local AI, especially if you want to keep a browser, editor, and documents open.
Simple memory planning tiers
These tiers are planning guidance, not guarantees. Quantization, model family, context length, GPU support, drivers, and app settings can move a model up or down.
| Memory tier | What it is good for | Sensible first model range | Main caveat |
|---|---|---|---|
| 8 GB RAM | Learning, tiny models, basic offline tests | 270M to 1.7B class models | Limited headroom and slow responses |
| 16 GB RAM | Beginner local chat, small summaries, light coding help | 1B to 4B class models, sometimes 7B with care | Background apps and long context hurt quickly |
| 32 GB RAM | Regular local AI use, better multitasking, small document workflows | 3B to 8B class models, sometimes 14B with the right setup | Not unlimited; large context still matters |
| 64 GB RAM | Larger models, heavier documents, local dev workflows | 7B to 14B+ class models, depending on VRAM and quantization | GPU/VRAM may still be the speed bottleneck |
| 128 GB+ RAM | Advanced local experiments, large context, multiple services | Larger local models and heavier pipelines | Expensive and still not a cloud frontier replacement |
If you are new, start below the tier limit. A small model that answers quickly is more useful than a bigger model that makes the machine feel frozen.
What you can expect with 8 GB RAM
An 8 GB laptop or older desktop can still teach you local AI basics. It can run tiny models, confirm that Ollama or LM Studio works, and let you experiment with offline prompts.
Good first tests include very small Ollama tags such as:
ollama run gemma3:1b
or:
ollama run qwen3:1.7b
On very weak systems, try an even smaller tag when available:
ollama run gemma3:270m
The problem with 8 GB RAM is not only whether the model loads. The operating system, browser, antivirus, update services, and the AI app all compete for memory. You may need to close other apps and keep prompts short.
Use 8 GB RAM for learning, not for judging whether local AI is good enough overall.
What you can expect with 16 GB RAM
For many beginners, 16 GB RAM is the first realistic local AI tier. LM Studio's official system requirements recommend at least 16 GB RAM, and that matches practical experience for small local models.
On 16 GB RAM, start with 1B to 4B models. Examples include:
ollama run llama3.2:3b
ollama run gemma3:4b
ollama run qwen3:4b
Some 7B or 8B models may work depending on quantization, app settings, GPU support, and context length. Treat that as a careful experiment, not a promise. If the computer starts swapping memory to disk, the model may technically run but feel too slow for daily use.
For a new local AI user, 16 GB is enough to find out whether local models are useful. It is not ideal for heavy document chat, running several tools at once, or keeping large models loaded all day.
What you can expect with 32 GB RAM
32 GB RAM is the better default for regular local AI use. It gives the operating system, browser, notes, coding editor, local model app, and smaller models room to coexist.
This is the tier where local AI starts to feel less like a fragile demo. You can test 4B to 8B models more comfortably, try short document workflows, and run tools such as Ollama plus Open WebUI without every background app becoming a problem.
32 GB also gives you room to compare models instead of judging everything from the smallest possible option. That matters because task fit is often more important than size. A small coding model may be better for code than a larger general chat model.
Still, 32 GB does not make every local model practical. Large 14B, 27B, 30B, or 32B class models can still be slow or memory-heavy, especially with long context.
What you can expect with 64 GB RAM or more
64 GB RAM is useful when local AI becomes a real workflow rather than a curiosity. It helps with larger models, long prompts, document workflows, development environments, and multitasking.
This tier is especially helpful if you run:
- Ollama or LM Studio with larger local models.
- Open WebUI with documents and chat history.
- A coding editor, browser, Docker, and local AI at the same time.
- Retrieval or document workflows that keep additional data in memory.
- Multiple model experiments during the same session.
For serious local AI, 64 GB system RAM is often paired with a dedicated GPU that has enough VRAM. System RAM gives headroom; VRAM usually determines how much of the model can run quickly on the GPU.
128 GB or more is advanced territory. It can be valuable for large local experiments, but it is easy to overspend if your real use case is short chat, summaries, or occasional coding help.
Why model file size is not the full memory requirement
Ollama model library pages show useful file sizes. For example, the library lists small tags such as llama3.2:1b, llama3.2:3b, gemma3:1b, gemma3:4b, qwen3:1.7b, and qwen3:4b with their own download sizes and context windows.
Those file sizes are helpful, but they are not the whole memory story. A loaded model may also need memory for runtime overhead, temporary buffers, prompt tokens, output tokens, and the context cache. If you increase context length, memory use increases.
This is why a model can fit on disk but fail to load, load but run slowly, or work with a short prompt but struggle with a large document.
Use file size as a warning sign, not a guarantee. If the model file is already close to your free memory, choose a smaller model.
Context length can change everything
Context length is the amount of text the model can keep available while answering. It affects memory because the model needs room to process and remember the prompt, system instructions, chat history, documents, and tool output.
Ollama's current context-length documentation says larger context requires more memory and gives VRAM-based defaults for different hardware classes. The practical lesson is simple: do not max out context length just because a model card says a large number is possible.
Long context is useful for:
- Large documents.
- Long chats.
- Coding tools with many files.
- Agents that pass around tool output.
- Web search or retrieval workflows.
Long context is also one of the fastest ways to make a model that normally fits become slow or unstable. If a model fails only on long prompts, reduce context length or split the task.
Windows laptop and desktop guidance
On Windows, memory planning depends heavily on whether you have a dedicated GPU.
For a Windows laptop with integrated graphics and 16 GB RAM, stay conservative. Use small models and avoid running Docker, a browser UI, a coding IDE, and a large model at the same time.
For a Windows desktop with 32 GB RAM and a GPU with 8 GB to 12 GB VRAM, local AI becomes more comfortable. You can test small to medium models, use Ollama or LM Studio, and add Open WebUI after the base model works.
For a stronger desktop with 64 GB RAM and 16 GB to 24 GB VRAM, you have room for heavier local AI experiments. That still does not mean every model will run well. GPU driver support, quantization, context length, and cooling still matter.
Ollama's current Windows docs list Windows 10 22H2 or newer and note native Windows support with NVIDIA and AMD Radeon GPU support. LM Studio's current requirements list x64 and ARM Windows support, AVX2 for x64 CPUs, 16 GB RAM recommended, and 4 GB dedicated VRAM recommended.
Mac unified memory guidance
For Apple Silicon Macs, think in unified memory tiers:
- 8 GB unified memory: tiny local models and learning only.
- 16 GB unified memory: small models and modest context.
- 32 GB unified memory: a better daily local AI starting point.
- 64 GB+ unified memory: larger models, heavier workflows, and more multitasking.
The advantage is that Apple Silicon can use a shared memory architecture for CPU and GPU work. Apple describes unified memory as a large shared pool accessible across the chip, which can help on-device AI workloads.
The limitation is that memory is usually not upgradeable after purchase. If you are buying a Mac for local AI, do not treat 8 GB as a safe long-term choice. If local AI is a real reason for the purchase, 32 GB or more is a more practical target.
GPU VRAM guidance
If you have a dedicated GPU, VRAM affects both model size and speed. LM Studio recommends at least 4 GB dedicated VRAM, but that is a starting point. For local AI, 8 GB is more useful, 12 GB is a better budget target, and 16 GB to 24 GB gives much more room.
Use this rough guidance:
- 4 GB VRAM: small models and learning.
- 8 GB VRAM: better small-model experience.
- 12 GB VRAM: stronger budget target for daily local AI.
- 16 GB VRAM: more room for larger models and context.
- 24 GB+ VRAM: serious local model experimentation.
Do not shop by GPU name alone. Laptop GPUs may have less VRAM and lower sustained power than desktop GPUs with similar branding. AMD, NVIDIA, Apple, and CPU-only paths also have different driver and acceleration behavior.
Ollama's hardware support documentation currently lists NVIDIA, AMD Radeon, Apple Metal, and experimental Vulkan paths, with caveats by platform and driver. Check current support before assuming an older or unusual GPU will accelerate local AI.
Disk space is separate from RAM
Disk space stores the model files. RAM and VRAM load and run them. You need both.
A beginner can burn through disk space quickly by downloading several model families, each in multiple sizes. Keep only the models you actually use. For a normal local AI setup, SSD space matters more than raw disk size because model loading and app responsiveness are better on fast storage.
Practical storage guidance:
- Keep at least 20 GB free before experimenting.
- Prefer 100 GB+ free if you plan to compare several models.
- Use a 1 TB SSD if local AI is becoming a regular workflow.
- Track model downloads so your system drive does not fill unexpectedly.
If a model fails to download, check disk space before blaming RAM.
How to test your current machine safely
The best RAM test is a real local model test.
- Install Ollama or LM Studio.
- Start with a tiny or small model.
- Close heavy background apps.
- Ask a short prompt.
- Watch memory use in Task Manager, Activity Monitor, or your Linux system monitor.
- Try one slightly larger model only if the first one is stable.
- Reduce context length if long prompts cause trouble.
For Ollama, start small:
ollama run llama3.2:3b
If that is slow, step down:
ollama run gemma3:1b
If it works well, compare another small model rather than jumping straight to a large one:
ollama run qwen3:4b
You are looking for a practical result, not a benchmark trophy. If the model answers in a tolerable time and the computer stays responsive, the setup is useful enough to continue testing.
Questions to ask before buying more RAM or a new machine
- What model size do you actually need for your task?
- Is the current bottleneck system RAM, VRAM, CPU speed, storage, or context length?
- Can your current laptop or desktop RAM be upgraded?
- Would a smaller model solve the problem?
- Are you running too many background apps while testing?
- Do you need local privacy, offline use, or just cheaper access?
- Would a cloud API be simpler for occasional high-quality answers?
- Are you buying for today's real workflow or for vague future AI plans?
These questions prevent the most expensive mistake: buying more machine before proving the workflow.
Red flags and common errors
Do not assume disk size equals memory fit. A model that downloads successfully can still fail or run poorly when loaded.
Do not assume a larger context window is free. Bigger context can increase memory use enough to turn a working setup into a bad one.
Do not keep a large browser session, Docker, a coding IDE, a video call, and a local model open on a low-RAM laptop unless you enjoy debugging memory pressure.
Do not assume a GPU helps if the model cannot fit enough layers in VRAM or if the driver path is not supported.
Do not compare a tiny local model to a premium cloud model and call local AI bad. Compare models that fit your hardware and task.
Do not buy an 8 GB machine for local AI in 2026 unless you only want tiny experiments.
Bottom line
You can try local AI with 8 GB RAM, but 16 GB is the practical beginner floor, 32 GB is the better everyday target, and 64 GB or more is useful for larger models, longer context, and heavier workflows. VRAM and Apple unified memory change the calculation, but they do not remove the need for headroom. Start with a small model, measure your own machine, and upgrade only after you know what is actually limiting the workflow.