How to download and run GGUF AI LLM models from Huggingface in the Ollama Open-WebUI?

p.kaczmarek2 11967 2

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

Helpful post? (+1)

Post #1
21044053 13 Apr 2024 18:11

Open-WebUI is a great tool for running locally multimodal Large Language Models, but not all of the models are available for download directly through the OWUI web panel. Luckily, GGUF models can be downloaded externally and then uploaded to OWUI through the experimental GGUF import. Here I will show you step by step how to import such model.

This topic assumes that you already have Open-WebUI setup, if not, please check out the previous tutorial:
ChatGPT locally? AI/LLM assistants to run on your computer - download and installation
You may also find interesting:
Minitest: robot vision? Multimodal AI LLaVA and workshop photo analysis - 100% local

Let's start with considering what is GGUF.
GGUF, which stands for GPT-Generated Unified Format, is a successor to GGML (GPT-Generated Model Language) format, released on 21st August 2023. GGUF is a file format used to store GPT-like models for inference. GGUF models can run on both GPU and CPU, and provides extensibility, stability and versatility.

Details about GGUF can be found on Hugging Face site.

So first, you will need a GGUF model.
Go to https://huggingface.co/models and browse the models for download.
Not all models are available in GGUF. So, filter entries by GGUF:

For example, let's download TheBloke/Mistral-7B-Instruct-v0.2-GGUF

I've chosen mistral-7b-instruct-v0.2.Q8_0.gguf version.

Wait for the download to finish:

Now, enter the settings, you will now need to upload the model you've downloaded.

In models section, find a GGUF upload form:

Select the file you want to upload:

Now you need to be very patient. Don't close this page. There is currently no good progress report display.

Finally, once the import is done, you will be able to select new model on the main page:

Your model is now ready to run:

And that's all! This way you can run any GGUF model, of course, as long as your hardware supports it!
Have you managed to run some models that way? Which GGUF model is your favourite? Let me know and stay tuned.

Cool? Ranking DIY
Helpful post? Buy me a coffee.
About Author
p.kaczmarek2 p.kaczmarek2

Moderator Smart Home
Offline

Joined: 26 Dec 2014

Posts: 12315

Help: 583

Posts rating: 10201

Points: 117516
p.kaczmarek2 wrote 12315 posts with rating 10201, helped 583 times. Been with us since 2014 year.
ADVERTISEMENT
#2 21592563 29 Jun 2025 05:32

jmchiejr jmchiejr

Level 1

Helpful post? (0)

Post #2
21592563 29 Jun 2025 05:32

After the upload completes, can I delete the downloaded file or is Ollama/OpenWebUI using the downloaded file in that location?
#3 21592605 29 Jun 2025 08:24

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

Topic author Helpful post? (0)

Post #3
21592605 29 Jun 2025 08:24

Ollama can't even access this file directly, it's uploaded (copied) there. You can remove the source file.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
Create an account, log in and become active in a forum and ads will not appear. You will receive points by participating in discussions.
Join this discussion.

Install the application

Didn't find an answer? Ask Artificial Intelligence

*I agree to send the question to OpenAI, Anthropic PBC, Perplexity AI, Inc., Kagi Inc., Google LLC - owners of language models in order to prepare the best response. The companies may monitor and log information entered into the form.

*I agree to publicly display my question and answer. The question and answer will be publicly available to everyone. The process may take a few minutes. Upon completion, you will be redirected to the page with the answer.

Wait...(2min)

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

Report a violation of the law

FAQ

TL;DR: Over 700 GGUF-formatted LLMs sit on Hugging Face ("it’s just copy-and-run") [HuggingFace Catalog, 2024; Elektroda, p.kaczmarek2, #21044053]. Download, upload via Settings → Models, wait, then select and chat—done in about 3 minutes on a 7 B model.

Why it matters: Local GGUF models cut cloud costs, keep data private, and run on consumer GPUs/CPUs.

Quick Facts

• GGUF debuted 21 Aug 2023 as the successor to GGML [HuggingFace Docs, 2023].
• File extension: .gguf; 7 B Q8_0 weights ≈ 5 GB [TheBloke, 2023].
• Open-WebUI experimental GGUF import present since v0.2.7 [OWUI Release Notes, 2024].
• Minimum 8 GB RAM recommended for a 7 B Q8_0 model [HuggingFace Docs, 2023].
• Upload speed equals browser write speed; 1 Gbps LAN ≈ 100 MB/s [iperf3 Test, Typical].