Can I run DeepSeek-R1 completely offline?

Yes, after Ollama and the model files are already installed locally. Downloading the model requires internet access, but normal local inference can run without cloud calls.

Which DeepSeek-R1 model should I start with?

Start with `deepseek-r1:8b` or `deepseek-r1:14b`. Use `32b` if you have enough memory and want better reasoning. Use smaller models for speed and larger models for quality.

What is the difference between the full model and distilled models?

The full DeepSeek-R1 model is the large 671B-parameter model. The distilled models are smaller dense models trained from DeepSeek-R1-generated reasoning data and are much more practical for local machines.

Why is DeepSeek-R1 slow?

Reasoning models often generate more internal reasoning tokens before the final answer. Larger model sizes and long context windows also slow local inference.

Can I use DeepSeek-R1 with Open WebUI?

Yes. Run DeepSeek-R1 in Ollama, then connect Open WebUI to Ollama and select the installed DeepSeek-R1 model.

How to Install DeepSeek-R1 Locally With Ollama

Model overview

DeepSeek-R1 is a reasoning-focused model family from DeepSeek. The full model is a 671B-parameter mixture-of-experts model. DeepSeek also released distilled dense models based on Qwen and Llama model families, including 1.5B, 7B, 8B, 14B, 32B, and 70B variants.

For local users, the distilled models are the main story. They are much smaller than the full model and can run through local runtimes such as Ollama, LM Studio, vLLM, or SGLang depending on your hardware and goal.

DeepSeek-R1 is useful for math, coding, planning, and multi-step reasoning. It is not a magic replacement for every chat model. Reasoning models can be slower because they spend more tokens thinking before answering.

Step-by-step setup

Quick requirements

Before you start, prepare a Windows, macOS, Linux, or WSL2 machine that can run Ollama, enough disk space for the DeepSeek-R1 model variant you choose, a terminal, and a browser if you plan to use Open WebUI. A first local test usually takes 20 to 45 minutes after Ollama is installed, plus download time for the model.

Step 1: Install Ollama

Download Ollama for your operating system:

https://ollama.com/download

After installing, check that Ollama is running:

ollama --version

Then test the local API:

curl http://localhost:11434/api/tags

If this fails, start Ollama from the desktop app or service manager and try again.

Step 2: Choose the right DeepSeek-R1 model size

Ollama exposes DeepSeek-R1 through tags. The common tags include:

deepseek-r1:1.5b
deepseek-r1:7b
deepseek-r1:8b
deepseek-r1:14b
deepseek-r1:32b
deepseek-r1:70b
deepseek-r1:671b

The download sizes in Ollama are not the same as required system memory during a long session, but they are a useful warning sign:

Tag	Ollama download size	Practical use
`deepseek-r1:1.5b`	about 1.1 GB	Fast testing, weak reasoning
`deepseek-r1:7b`	about 4.7 GB	Entry-level local reasoning
`deepseek-r1:8b`	about 5.2 GB	General local testing
`deepseek-r1:14b`	about 9.0 GB	Better reasoning on stronger machines
`deepseek-r1:32b`	about 20 GB	High-quality workstation option
`deepseek-r1:70b`	about 43 GB	High-memory workstation or server
`deepseek-r1:671b`	hundreds of GB	Server-class deployment

If you are unsure, start with 8b or 14b. If you need better reasoning and have enough RAM or unified memory, try 32b. If the model becomes slow, unstable, or causes memory pressure, use a smaller tag.

Step 3: Run DeepSeek-R1 locally

Pull and run a model:

ollama run deepseek-r1:8b

Try a reasoning prompt:

Solve this step by step: A project has 3 phases. Phase 1 takes 4 days, phase 2 takes twice as long, and phase 3 takes 5 days less than phase 2. How long is the full project?

To keep the model installed but stop the chat, exit the session. Ollama keeps the model in its local model store until you remove it.

List installed models:

ollama list

Remove a model you no longer need:

ollama rm deepseek-r1:8b

Step 4: Use the local API

Ollama automatically exposes a local API on:

http://localhost:11434

Test the model list:

curl http://localhost:11434/api/tags

Use Ollama's native generate API:

curl http://localhost:11434/api/generate \
  -d '{
    "model": "deepseek-r1:8b",
    "prompt": "Give me a short checklist for testing a local model install.",
    "stream": false
  }'

For OpenAI-compatible clients, use Ollama's /v1 endpoint:

http://localhost:11434/v1

Example:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:11434/v1",
    api_key="ollama",
)

response = client.chat.completions.create(
    model="deepseek-r1:8b",
    messages=[
        {"role": "user", "content": "Explain why local reasoning models can be slower."}
    ],
)

print(response.choices[0].message.content)

Step 5: Connect DeepSeek-R1 to Open WebUI

Open WebUI is the easiest browser interface for local Ollama models.

If Open WebUI runs on the same host as Ollama, connect it to:

http://localhost:11434

If Open WebUI runs in Docker and Ollama runs on the host, use:

http://host.docker.internal:11434

After Open WebUI connects to Ollama, select your DeepSeek-R1 tag from the model dropdown and start a chat.

Use Open WebUI when you want:

Chat history
A browser interface
Document uploads
Multi-user access
Easy model switching
A more comfortable UI than the terminal

Step 6: Use better prompts for DeepSeek-R1

DeepSeek's own usage guidance is more specific than normal chat-model prompting.

Use plain user prompts. Avoid stuffing complex behavior into a system prompt for this model family unless your runtime or application requires one.

For reasoning tasks, ask for the result clearly:

Reason step by step and give the final answer at the end.

For math:

Please reason step by step and put the final answer in a clearly labeled final line.

If the model loops, repeats, or produces incoherent output, reduce the prompt complexity, lower the context burden, or try a different model size.

Step 7: Performance and hardware expectations

Local model speed depends on:

Model size
Quantization
RAM or unified memory
VRAM
CPU speed
GPU support
Context length
Prompt length
Other apps using memory

The 1.5B and 7B models are easier to run but less capable. The 14B and 32B models are stronger but need more memory. The 70B model is for high-memory workstations. The 671B model should be treated as a specialized server deployment.

Do not judge DeepSeek-R1 only by the smallest model. The small distilled variants are convenient, but they are not the full model.

Common problems

Ollama says the model does not exist

Check the tag:

ollama pull deepseek-r1:8b

Then run:

ollama list

Responses are extremely slow

Use a smaller model, close memory-heavy apps, reduce context length, or move to a machine with more memory. If your system starts swapping to disk, the model is too large for comfortable use.

The model repeats itself or gets stuck thinking

Try a smaller prompt, a temperature around the range recommended by the model authors, or another model size. DeepSeek-R1 can generate long reasoning traces; that is useful for hard tasks but annoying for simple ones.

Open WebUI cannot see DeepSeek-R1

Confirm Ollama has the model:

ollama list

Then confirm Open WebUI points to the right Ollama base URL. Use http://host.docker.internal:11434 when Open WebUI is in Docker and Ollama is on the host.

The local API works in curl but not in a client

Check whether the client expects Ollama's native API or an OpenAI-compatible endpoint. Use http://localhost:11434 for native Ollama and http://localhost:11434/v1 for OpenAI-compatible clients.

Security and privacy notes

Running DeepSeek-R1 locally keeps inference on your machine when you use a local runtime and local UI. That does not automatically make every workflow private.

Check:

Whether your chat UI syncs data
Whether a third-party app sends telemetry
Whether documents are uploaded to a cloud service
Whether API clients call a remote provider instead of Ollama
Whether model licenses fit your intended use

For sensitive work, test with dummy data first and confirm the network path.

Background, planning, and caveats

Approximate planning cost (U.S.)

DeepSeek-R1 local planning is mostly compute-driven:

Hardware: bigger context windows and larger tags need stronger RAM/VRAM and disk.
Storage: each model tag has a fixed download footprint and local cache growth.
Optional cloud/VPS: use VPS/GPU instances only when you need predictable uptime or higher throughput.
API/model costs: local runtime lowers per-request cloud costs, but external model providers or web services can still bill usage.
Optional managed model services: useful if model upgrade cadence and hardware replacement become the bottleneck.

Cost breakdown

one-time local setup:
- model-capable hardware,
- SSD/storage,
- local backup media.
recurring local setup:
- electrical and connectivity cost of longer inference runs,
- any third-party API usage outside Ollama.

Data persistence and backups

Keep Ollama model artifacts and runtime state available:

preserve downloaded model artifacts after ollama pull,
back up prompt templates and wrapper scripts used by your local apps,
version your open-webui / client config when changing providers.

Credential handling

Avoid hardcoding provider endpoints and token names in scripts.
For UI or API integrations, keep keys in secure environment variables or secret storage.
Rotate keys before exposing local inference to another device.

Security risk and operational cautions

Local inference is private by default only if model and data stay on trusted devices.
Reasoning models often produce longer output; sanitize any generated content before sending it to downstream automation.
Don't run unknown model binaries or prompts against sensitive environments without dry-run validation.

Questions before installing

Is your workload best served by deep reasoning models (1.5b to 671b) or speed and determinism?
What minimum quality threshold and latency are acceptable in your use case?
Do you have a clear fallback if a model tag is unavailable or replaced upstream?

Red flags

Choosing a model size that clearly exceeds your machine capacity.
Assuming full local privacy while the UI or integrations sync history externally.
No documented rollback plan if a model update changes behavior.

Rollback and update guidance

For rollback:

keep working model tags and configs as checkpoints,
rerun with a smaller tag when a version causes regressions.

For updates:

refresh model metadata with ollama pull,
validate api/tags and a short benchmark prompt before switching workflows,
keep the previous tag available for quick fallback.

Bottom line

Use Ollama for the simplest local DeepSeek-R1 setup. Start with deepseek-r1:8b or deepseek-r1:14b, move up to 32b if your hardware can handle it, and treat 70b and 671b as serious hardware projects.

For a browser UI, connect Open WebUI to Ollama. For apps and scripts, use http://localhost:11434 for Ollama's native API or http://localhost:11434/v1 for OpenAI-compatible clients.

How to Install DeepSeek-R1 Locally With Ollama

Model overview

Step-by-step setup

Step 1: Install Ollama

Step 2: Choose the right DeepSeek-R1 model size

Step 3: Run DeepSeek-R1 locally

Step 4: Use the local API

Step 5: Connect DeepSeek-R1 to Open WebUI

Step 6: Use better prompts for DeepSeek-R1

Step 7: Performance and hardware expectations

Common problems

Ollama says the model does not exist

Responses are extremely slow

The model repeats itself or gets stuck thinking

Open WebUI cannot see DeepSeek-R1

The local API works in curl but not in a client

Security and privacy notes

Background, planning, and caveats

Approximate planning cost (U.S.)

Cost breakdown

Data persistence and backups

Credential handling

Security risk and operational cautions

Questions before installing

Red flags

Rollback and update guidance

Bottom line

FAQ

Choose where to go from here

Find related resources

Browse all calculators

Model overview

Step-by-step setup

Step 1: Install Ollama

Step 2: Choose the right DeepSeek-R1 model size

Step 3: Run DeepSeek-R1 locally

Step 4: Use the local API

Step 5: Connect DeepSeek-R1 to Open WebUI

Step 6: Use better prompts for DeepSeek-R1

Step 7: Performance and hardware expectations

Common problems

Ollama says the model does not exist

Responses are extremely slow

The model repeats itself or gets stuck thinking

Open WebUI cannot see DeepSeek-R1

The local API works in curl but not in a client

Security and privacy notes

Background, planning, and caveats

Approximate planning cost (U.S.)

Cost breakdown

Data persistence and backups

Credential handling

Security risk and operational cautions

Questions before installing

Red flags

Rollback and update guidance

Bottom line

Recommended next reads

How to Install CrewAI Locally With Ollama and Python

How to Install Cline in VS Code for AI Workflows

How to Install n8n Locally for Workflow Automation

How to Install Ollama Locally on Windows, Mac, Linux

FAQ

Choose where to go from here

Find related resources

Browse all calculators