Model overview

DeepSeek-R1 is a reasoning-focused model family from DeepSeek. The full model is a 671B-parameter mixture-of-experts model. DeepSeek also released distilled dense models based on Qwen and Llama model families, including 1.5B, 7B, 8B, 14B, 32B, and 70B variants.

For local users, the distilled models are the main story. They are much smaller than the full model and can run through local runtimes such as Ollama, LM Studio, vLLM, or SGLang depending on your hardware and goal.

DeepSeek-R1 is useful for math, coding, planning, and multi-step reasoning. It is not a magic replacement for every chat model. Reasoning models can be slower because they spend more tokens thinking before answering.

Step-by-step setup

Quick requirements

Before you start, prepare a Windows, macOS, Linux, or WSL2 machine that can run Ollama, enough disk space for the DeepSeek-R1 model variant you choose, a terminal, and a browser if you plan to use Open WebUI. A first local test usually takes 20 to 45 minutes after Ollama is installed, plus download time for the model.

Step 1: Install Ollama

Download Ollama for your operating system:

https://ollama.com/download

After installing, check that Ollama is running:

ollama --version

Then test the local API:

curl http://localhost:11434/api/tags

If this fails, start Ollama from the desktop app or service manager and try again.

Step 2: Choose the right DeepSeek-R1 model size

Ollama exposes DeepSeek-R1 through tags. The common tags include:

deepseek-r1:1.5b
deepseek-r1:7b
deepseek-r1:8b
deepseek-r1:14b
deepseek-r1:32b
deepseek-r1:70b
deepseek-r1:671b

The download sizes in Ollama are not the same as required system memory during a long session, but they are a useful warning sign:

Tag Ollama download size Practical use
deepseek-r1:1.5b about 1.1 GB Fast testing, weak reasoning
deepseek-r1:7b about 4.7 GB Entry-level local reasoning
deepseek-r1:8b about 5.2 GB General local testing
deepseek-r1:14b about 9.0 GB Better reasoning on stronger machines
deepseek-r1:32b about 20 GB High-quality workstation option
deepseek-r1:70b about 43 GB High-memory workstation or server
deepseek-r1:671b hundreds of GB Server-class deployment

If you are unsure, start with 8b or 14b. If you need better reasoning and have enough RAM or unified memory, try 32b. If the model becomes slow, unstable, or causes memory pressure, use a smaller tag.

Step 3: Run DeepSeek-R1 locally

Pull and run a model:

ollama run deepseek-r1:8b

Try a reasoning prompt:

Solve this step by step: A project has 3 phases. Phase 1 takes 4 days, phase 2 takes twice as long, and phase 3 takes 5 days less than phase 2. How long is the full project?

To keep the model installed but stop the chat, exit the session. Ollama keeps the model in its local model store until you remove it.

List installed models:

ollama list

Remove a model you no longer need:

ollama rm deepseek-r1:8b

Step 4: Use the local API

Ollama automatically exposes a local API on:

http://localhost:11434

Test the model list:

curl http://localhost:11434/api/tags

Use Ollama's native generate API:

curl http://localhost:11434/api/generate \
  -d '{
    "model": "deepseek-r1:8b",
    "prompt": "Give me a short checklist for testing a local model install.",
    "stream": false
  }'

For OpenAI-compatible clients, use Ollama's /v1 endpoint:

http://localhost:11434/v1

Example:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)

response = client.chat.completions.create(
    model="deepseek-r1:8b",
    messages=[
        {"role": "user", "content": "Explain why local reasoning models can be slower."}
    ],
)

print(response.choices[0].message.content)

Step 5: Connect DeepSeek-R1 to Open WebUI

Open WebUI is the easiest browser interface for local Ollama models.

If Open WebUI runs on the same host as Ollama, connect it to:

http://localhost:11434

If Open WebUI runs in Docker and Ollama runs on the host, use:

http://host.docker.internal:11434

After Open WebUI connects to Ollama, select your DeepSeek-R1 tag from the model dropdown and start a chat.

Use Open WebUI when you want:

  • Chat history
  • A browser interface
  • Document uploads
  • Multi-user access
  • Easy model switching
  • A more comfortable UI than the terminal

Step 6: Use better prompts for DeepSeek-R1

DeepSeek's own usage guidance is more specific than normal chat-model prompting.

Use plain user prompts. Avoid stuffing complex behavior into a system prompt for this model family unless your runtime or application requires one.

For reasoning tasks, ask for the result clearly:

Reason step by step and give the final answer at the end.

For math:

Please reason step by step and put the final answer in a clearly labeled final line.

If the model loops, repeats, or produces incoherent output, reduce the prompt complexity, lower the context burden, or try a different model size.

Step 7: Performance and hardware expectations

Local model speed depends on:

  • Model size
  • Quantization
  • RAM or unified memory
  • VRAM
  • CPU speed
  • GPU support
  • Context length
  • Prompt length
  • Other apps using memory

The 1.5B and 7B models are easier to run but less capable. The 14B and 32B models are stronger but need more memory. The 70B model is for high-memory workstations. The 671B model should be treated as a specialized server deployment.

Do not judge DeepSeek-R1 only by the smallest model. The small distilled variants are convenient, but they are not the full model.

Common problems

Ollama says the model does not exist

Check the tag:

ollama pull deepseek-r1:8b

Then run:

ollama list

Responses are extremely slow

Use a smaller model, close memory-heavy apps, reduce context length, or move to a machine with more memory. If your system starts swapping to disk, the model is too large for comfortable use.

The model repeats itself or gets stuck thinking

Try a smaller prompt, a temperature around the range recommended by the model authors, or another model size. DeepSeek-R1 can generate long reasoning traces; that is useful for hard tasks but annoying for simple ones.

Open WebUI cannot see DeepSeek-R1

Confirm Ollama has the model:

ollama list

Then confirm Open WebUI points to the right Ollama base URL. Use http://host.docker.internal:11434 when Open WebUI is in Docker and Ollama is on the host.

The local API works in curl but not in a client

Check whether the client expects Ollama's native API or an OpenAI-compatible endpoint. Use http://localhost:11434 for native Ollama and http://localhost:11434/v1 for OpenAI-compatible clients.

Security and privacy notes

Running DeepSeek-R1 locally keeps inference on your machine when you use a local runtime and local UI. That does not automatically make every workflow private.

Check:

  • Whether your chat UI syncs data
  • Whether a third-party app sends telemetry
  • Whether documents are uploaded to a cloud service
  • Whether API clients call a remote provider instead of Ollama
  • Whether model licenses fit your intended use

For sensitive work, test with dummy data first and confirm the network path.

Background, planning, and caveats

Approximate planning cost (U.S.)

DeepSeek-R1 local planning is mostly compute-driven:

  • Hardware: bigger context windows and larger tags need stronger RAM/VRAM and disk.
  • Storage: each model tag has a fixed download footprint and local cache growth.
  • Optional cloud/VPS: use VPS/GPU instances only when you need predictable uptime or higher throughput.
  • API/model costs: local runtime lowers per-request cloud costs, but external model providers or web services can still bill usage.
  • Optional managed model services: useful if model upgrade cadence and hardware replacement become the bottleneck.

Cost breakdown

  • one-time local setup:
    • model-capable hardware,
    • SSD/storage,
    • local backup media.
  • recurring local setup:
    • electrical and connectivity cost of longer inference runs,
    • any third-party API usage outside Ollama.

Data persistence and backups

Keep Ollama model artifacts and runtime state available:

  • preserve downloaded model artifacts after ollama pull,
  • back up prompt templates and wrapper scripts used by your local apps,
  • version your open-webui / client config when changing providers.

Credential handling

  • Avoid hardcoding provider endpoints and token names in scripts.
  • For UI or API integrations, keep keys in secure environment variables or secret storage.
  • Rotate keys before exposing local inference to another device.

Security risk and operational cautions

  • Local inference is private by default only if model and data stay on trusted devices.
  • Reasoning models often produce longer output; sanitize any generated content before sending it to downstream automation.
  • Don't run unknown model binaries or prompts against sensitive environments without dry-run validation.

Questions before installing

  • Is your workload best served by deep reasoning models (1.5b to 671b) or speed and determinism?
  • What minimum quality threshold and latency are acceptable in your use case?
  • Do you have a clear fallback if a model tag is unavailable or replaced upstream?

Red flags

  • Choosing a model size that clearly exceeds your machine capacity.
  • Assuming full local privacy while the UI or integrations sync history externally.
  • No documented rollback plan if a model update changes behavior.

Rollback and update guidance

For rollback:

  • keep working model tags and configs as checkpoints,
  • rerun with a smaller tag when a version causes regressions.

For updates:

  • refresh model metadata with ollama pull,
  • validate api/tags and a short benchmark prompt before switching workflows,
  • keep the previous tag available for quick fallback.

Bottom line

Use Ollama for the simplest local DeepSeek-R1 setup. Start with deepseek-r1:8b or deepseek-r1:14b, move up to 32b if your hardware can handle it, and treat 70b and 671b as serious hardware projects.

For a browser UI, connect Open WebUI to Ollama. For apps and scripts, use http://localhost:11434 for Ollama's native API or http://localhost:11434/v1 for OpenAI-compatible clients.