ChatGPT locally? AI/LLM assistants to run on your computer - download and installation

p.kaczmarek2 8658 10

Treść została przetłumaczona

Zobacz oryginalną wersję tematu

Report a violation of the law

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

» | Topic author Helpful post? (+6)

Post #1
21029503 02 Apr 2024 18:01

It would seem that today`s great AI models are inextricably linked to the cloud, and therefore also to the lack of privacy - nothing could be further from the truth! I will show here how you can easily download and run a 100% locally interesting alternative to ChatGPT along with a confusingly similar form of website offering the choice of the LLM model, chat history and even the possibility of e.g. interpreting the provided images and files.
All this is offered by a project called Open-WebUI, link to homepage/repository below:
https://github.com/open-webui/open-webui

Open-WebUI installation
There is a "Quick Start With Docker" section in the Readme, but as usual, the installation is actually more difficult than the project authors declare.
Docker is an environment for running various projects that is characterized by containerization and software isolation, as well as its portability.
Initially I tried to install Docker 4.27.1 on my Windows 10:

Unfortunately, it didn`t work - the spoiler explains why:

Spoiler:

It would seem that the installation was successful:
We launch...

We configure:

Login is optional:

And there is a problem, something is wrong with WSL, i.e. the "Linux" overlay on Windows:

There are several topics on the docker forum about this:
https://forums.docker.com/t/updating-wsl-upda...-update-n-web-download-not-supported/138452/9
Below are screenshots of my attempts to fix the problem - without positive results, so no comment:

To sum up, Open-WebUI did not work for me on the latest Docker, so I started checking previous versions of Docker and it turned out that in 4.24.1 the problem does not occur:

Success - we immediately have a docker panel:

Docker is ready. Now you need to download the repository, you can even do it through a browser, GIT is not necessary. We simply find the zip download:
https://github.com/open-webui/open-webui

Then we install everything via docker, I chose the version without GPU support (everything will be counted on the CPU, RAM will also be required):

Finally a flawless installation:

From this point on they should be open webui in Docker:

First run
We run Open WebUI in Docker. Go to create a web service (our local IP + selected port) to see the web interface of the language models:

The first login to WebUI will require creating an account, but we create this account on our server, everything is local:

Then we are greeted by a web panel like the one known from ChatGPT:

Nothing can be run at this point - we are missing language models:

You have to click on the cog and download a model.
To download a model, search for its name here:
https://ollama.com/library
And then paste it into the download as follows:

After downloading, the model will appear in the list to select:

You can ask him questions just like in the case of ChatGPT, the capabilities themselves depend on the model, just as the RAM requirement and the token generation speed also depend on what we have chosen, there are "lighter" and "heavier" models.

Text and code
Locally running models are able to generate code and complete text similarly to ChatGPT, although the final results depend on the model chosen. Here are some examples:

The above example with mapping the value read from the ADC to the LED blinking frequency is very similar to ChatGPT.
Similarly with other codes:

He`s done some thinking here - e.g. this link does not lead to an existing page and the code itself is at least incomplete.
Of course, models are also susceptible to hallucinations - here, for example, a code is created for a non-existent system MCP237017 which was created as a result of a typo I made:

Image analysis
Some models can also describe images. An example is llava, which can, for example, describe an apple:

She is not afraid of more difficult tasks - here she describes the LED "bulb":

Unfortunately, when it comes to slightly less popular issues, the model gets lost and creates hallucinations:

If you want me to add an image to this model, please post the image in the topic, if I have a moment I will check the results.

Advanced settings
The Open WebUI offers much more than the OpenAI website - here we can change advanced language model settings, such as: Temperature, Mirostat, Top K, Stop Sequence, Max Tokens, Context Length and much more - for explanations, please refer to materials from the Internet.

Hardware requirements?
It all depends on whether we use the GPU or CPU version and what model we want to run, but in general it is not very fast, it also takes up a lot of RAM, here`s a screenshot from me after an hour of play:

As for the reaction time... we will measure it, for example on Mistral:7b:

The above response took 45 seconds to generate:

Summary
These free, local equivalents will not beat commercial language models, but they actually provide more than a substitute for the popular ChatGPT locally. Additionally, there are plenty of these models and they offer a lot of parameterization and tuning options, which allows us to adapt them to our needs. In the future, I intend to try to compare them better and explore their possibilities, but I don`t know how to go about it yet and the huge selection of LLMs to download doesn`t make it easier.
Installing the whole thing is quite simple as long as we don`t fall into the "trap" of the new Docker, but I wrote about it at the beginning, following my instructions we will install the whole thing without any problems.
However, I don`t know what it`s like to run it on weaker hardware - if anyone wants to have fun, I invite you to test it and share your results!
Have you tried running LLM language models locally, and if so, what were the results? I invite you to discuss.

Cool? Ranking DIY
Helpful post? Buy me a coffee.
About Author
p.kaczmarek2 p.kaczmarek2

Moderator Smart Home
Offline

Joined: 26 Dec 2014

Posts: 13147

Help: 605

Posts rating: 10938

Points: 126474
p.kaczmarek2 wrote 13147 posts with rating 10938, helped 605 times. Been with us since 2014 year.
ADVERTISEMENT
#2 21031745 04 Apr 2024 08:50

dktr dktr

Level 25

» | Helpful post? (+1)

Post #2
21031745 04 Apr 2024 08:50

I ran Easy Diffusion locally on the GPU - RTX4070 and Core i9 10th. I`m surprised how well it already works and what quality graphics can be generated. A 720x720 image takes approximately 15 seconds to generate.
ADVERTISEMENT
#3 21036351 07 Apr 2024 15:25

Mateusz_konstruktor Mateusz_konstruktor

Level 36

» | Helpful post? (0)

Post #3
21036351 07 Apr 2024 15:25

p.kaczmarek2 wrote:
If you want me to add an image to this model, please post the image in the topic, if I have a moment I will check the results.

Something more difficult.

Something easier.
ADVERTISEMENT
#4 21036590 07 Apr 2024 18:29

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #4
21036590 07 Apr 2024 18:29

Before checking, I will write down my prediction: it will cope with the toaster, not so much with the chip, in the case of the chip it will read the marking incorrectly and write something about the IC.

Checking:

As I thought, the second one:

Also as expected, but even read well, AD0201F.

Any other pictures I can check?

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#5 21036696 07 Apr 2024 19:33

Mateusz_konstruktor Mateusz_konstruktor

Level 36

» | Helpful post? (0)

Post #5
21036696 07 Apr 2024 19:33

p.kaczmarek2 wrote:
Any other pictures I can check?

Especially for my friend, at the same time deliberately not very sophisticated except for the last one.
#6 21036871 07 Apr 2024 21:02

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #6
21036871 07 Apr 2024 21:02

Battery, a bit weak, read "Panasonic" but made heresy powerbank:

Here again sense and nonsense are mixed (he read 1539 as 1569 but converted it into a capacity), but he also read NiMH and SANIK...

Next are the connectors, but I know they weren`t in the training data, so we won`t discover anything new:

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#7 21037069 08 Apr 2024 00:18

Mateusz_konstruktor Mateusz_konstruktor

Level 36

» | Helpful post? (0)

Post #7
21037069 08 Apr 2024 00:18

p.kaczmarek2 wrote:
Next are the connectors, but I know they were not in the training data...

Do we have these connectors added at this point?
How is this "training" and adding further data performed?
#8 21037092 08 Apr 2024 01:31

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #8
21037092 08 Apr 2024 01:31

I see that a topic about the basics of LLM in general will be useful...

A typical user is not able to train LLM, it requires enormous computing power and a huge number of teaching examples. You can read about teaching LLaVa here:

https://github.com/haotian-liu/LLaVA

Quote:

🔥[NEW!] LLaVA-1.5 achieves SoTA on 11 benchmarks, with just simple modifications to the original LLaVA, utilizes all public data, completes training in ~1 day on a single 8-A100 node, and surpasses methods that use billion-scale date.

https://llava-vl.github.io/
Additionally, training requires a huge amount of well-described training data, for example:
https://huggingface.co/datasets/liuhaotian/LLaVA-Instruct-150K/viewer?row=0
Rather, typical users are forced to download already trained models if they are made available.

LLM also no longer studies while talking to him. LLM is a mechanism like a "mathematical function", you simply provide an image and text as input, and the output is an answer, LLM does not "remember" other conversations you have had with it. What I have is a model trained by others and made available for free, which I only run locally with different inputs. The only thing LLM "remembers" is the course of the current conversation as long as it fits in the context window.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#9 21037557 08 Apr 2024 13:59

Mateusz_konstruktor Mateusz_konstruktor

Level 36

» | Helpful post? (0)

Post #9
21037557 08 Apr 2024 13:59

AND.

p.kaczmarek2 wrote:
A typical user is not able to train LLM, it requires enormous computing power and a huge number of teaching examples.

Isn`t it the case that properly selected training examples require a small amount of computing power that is within the capabilities of even a relatively weak home computer? I am thinking of cases such as a transistor in a popular housing photographed in an appropriate way. Generally, the matter comes down to analyzing the shape of a geometric figure?
I am attaching an example image.

B
Isn`t there, therefore, a function of sending ready-made results of studies made by individual users to the "cloud", so to speak?
ADVERTISEMENT
#10 21037579 08 Apr 2024 14:28

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #10
21037579 08 Apr 2024 14:28

It doesn`t work like that at all, the process you are asking about is called fine-tuning and it involves providing additional teaching examples in the format of a photo and text description.

You would have to have, for example, a dataset of transistors.
Source:
https://github.com/haotian-liu/LLaVA/blob/main/docs/Finetune_Custom_Data.md
Video related to the topic:

The model discussed here can be run fully locally, which means it works without the Internet. I simply downloaded the previously trained model (weights) and the environment to run it and I can run it on my computer.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#11 21037915 08 Apr 2024 19:46

Mateusz_konstruktor Mateusz_konstruktor

Level 36

» | Helpful post? (+1)

Post #11
21037915 08 Apr 2024 19:46

However, I expected some solution analogous to sending information about incorrect operation through the web browser.
This would be an invaluable source of data, instead of some tedious training on machines that actually require huge computing power.
In the form presented by my colleague, it is completely resource-intensive and ill-considered, when the mechanism supporting design and corrections described above has already been functioning for many years.
Create an account, log in here. You will receive points by participating in discussions.
Join this discussion.

Install Elektroda application

Didn't find an answer? Ask Artificial Intelligence

*I agree to send the question to OpenAI, Anthropic PBC, Perplexity AI, Inc., Kagi Inc., Google LLC - owners of language models in order to prepare the best response. The companies may monitor and log information entered into the form.

*I agree to publicly display my question and answer. The question and answer will be publicly available to everyone. The process may take a few minutes. Upon completion, you will be redirected to the page with the answer.

Wait...(2min)

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

Report a violation of the law

Topic summary

The discussion revolves around the feasibility of running AI models, specifically LLMs (Large Language Models), locally on personal computers, emphasizing privacy concerns associated with cloud-based solutions. The Open-WebUI project is highlighted as a viable alternative to ChatGPT, allowing users to download and operate LLMs locally. Installation challenges, particularly with Docker, are noted, alongside user experiences with local AI implementations, such as Easy Diffusion on an RTX 4070 GPU. The conversation also touches on the limitations of typical users in training LLMs due to the high computational requirements and the nature of fine-tuning models with additional data. The potential for user-generated data to improve model performance is discussed, although it is clarified that LLMs do not learn from individual interactions in real-time.
Summary generated by the language model.

FAQ

TL;DR: Local LLMs are usable today: Mistral-7B returns an answer in 45 s on a CPU-only PC [Elektroda, p.kaczmarek2, post #21029503]; “Installing the whole thing is quite simple” [Elektroda, p.kaczmarek2, post #21029503] Pick Docker 4.24.1, not 4.27.1, for a glitch-free setup. 100 % of data stay on your machine.

Why it matters: You can run ChatGPT-style assistants offline, keeping costs and privacy under your control.

Quick Facts

• Docker 4.24.1 passes WSL checks; 4.27.1 triggers update-failed errors [Elektroda, p.kaczmarek2, post #21029503]
• Typical 7 B parameter model download: ~4 GB; LLaVA-1.5 vision model: ~13 GB [*ollama.com/library*]
• RTX 4070 generates a 720×720 image in ≈15 s with Easy Diffusion [Elektroda, dktr, post #21031745]
• LLaVA-1.5 reached state-of-the-art on 11 benchmarks and trained in ~1 day on an 8×A100 node [Liu, 2023]
• All prompts and files remain local; no cloud calls when weights are stored offline [Elektroda, p.kaczmarek2, post #21029503]

What is Open-WebUI and how is it different from cloud ChatGPT?

Open-WebUI is a self-hosted interface that mimics ChatGPT but runs entirely on your PC. It lets you choose, download, and switch language or vision models without sending any prompt to external servers [Elektroda, p.kaczmarek2, post #21029503]

Which Docker version should I install?

Use Docker 4.24.1. Newer 4.27.1 fails during WSL updates and blocks Open-WebUI containers [Elektroda, p.kaczmarek2, post #21029503]

How do I install Open-WebUI on Windows?

Install Docker 4.24.1 and enable WSL2.
Download the Open-WebUI zip from GitHub and extract it.
In the folder, run docker compose up -d (CPU or GPU file) and open http://localhost:3000 to create your local account [Elektroda, p.kaczmarek2, post #21029503]

What hardware do I need for smooth chats?

A mid-range CPU with 6 cores and 32 GB RAM handles 7 B models, but each answer may take ~45 s [Elektroda, p.kaczmarek2, post #21029503] GPUs with ≥8 GB VRAM cut latency sharply (see next Q).

How much faster is a GPU?

On an RTX 4070, Easy Diffusion renders a 720 × 720 image in 15 s; text models show similar 3--5× gains vs. CPU-only runs [Elektroda, dktr, post #21031745]

Where do I get language models and how large are they?

Search the Ollama library, copy the model name, and paste it in WebUI’s download box. Sizes vary: Phi-2 ≈2 GB, Mistral-7B ≈4 GB, LLaVA-1.5 ≈13 GB [ollama.com/library].

Can I train the model on my own connector photos?

Full training is infeasible at home. Even “light” LLaVA-1.5 needed one 8×A100 node for a day [Liu, 2023]. Fine-tuning is possible but still demands a labeled image–text dataset and hours on a decent GPU [Elektroda, p.kaczmarek2, post #21037579]

Does the model learn from my chats?

No. An LLM behaves like a frozen function: every session starts from the same weights. Only the current prompt history sits in RAM and resets when you clear it [Elektroda, p.kaczmarek2, post #21037092]

How do I cut hallucinations?

Use smaller temperature (≤0.7), add stop-sequences, and pick task-specific models. “Sense and nonsense are mixed” when the prompt lacks context [Elektroda, p.kaczmarek2, post #21036871]

What resources does a vision model consume?

Running LLaVA-7B held 12 GB RAM after an hour of tests on CPU [Elektroda, p.kaczmarek2, post #21029503] A GPU version also needs 8-12 GB VRAM.

Is there a ‘send error report to cloud’ option?

Open-WebUI stores data locally by design; it has no automatic telemetry. You can share logs manually on GitHub issues or forums [Elektroda, p.kaczmarek2, post #21037915]

What are the most common failure points?

WSL update failure in Docker 4.27.1 (edge-case fixed by downgrading) [Elektroda, p.kaczmarek2, post #21029503] 2. Missing model download—WebUI shows “No models found.” 3. Insufficient RAM; models crash at load time with ‘out-of-memory’ errors.

ChatGPT locally? AI/LLM assistants to run on your computer - download and installation

Didn't find an answer? Ask Artificial Intelligence

Topic summary