[AI] Graphics and image generator on your own computer - web interface for Stable Diffusion

p.kaczmarek2 3174 49

Treść została przetłumaczona

Zobacz oryginalną wersję tematu

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

#31 21495707 26 Mar 2025 09:34

gulson gulson

System Administrator

» | Helpful post? (0)

Post #31
21495707 26 Mar 2025 09:34

They are graphic designers on UBI (universal basic income). All in all, I'm slowly getting ready too.
ADVERTISEMENT
#32 21496958 27 Mar 2025 00:47

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #32
21496958 27 Mar 2025 00:47

andrzejlisek wrote:
Such a general question. Can this whole offline generator work deterministically? Is it possible to configure it in such a way (e.g. by specifying a draw seed) that if I call an image query, I get an image, and if I later call the same query, I get the same image and not a different variant of the image that represents the same thing?

Normally this uses pseudo-randomisation and every time it is called, the result is different, just as it was intended. Is it possible to control it with this, i.e. to eliminate pseudo-randomisation or to start with the same seed every time?
.
There is an option there to set a seed:
.
So a fixed seed will give the same image, but you can change something else as a test, such as the weight of the LORA and see how that affects the effect.

gulson wrote:
Yesterday they released a real bomb:
https://openai.com/index/introducing-4o-image-generation/
Sensational generation of captions, postcards, infographics, icons etc.
Graphic designers hate them.
.
Well, and you can already do a full blown lamp:
.

gulson wrote:
This is the graphic designers at UBI (universal basic income). In fact, I'm slowly getting ready too.

Slightly related quote:
.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
ADVERTISEMENT
#33 21497168 27 Mar 2025 09:29

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #33
21497168 27 Mar 2025 09:29

@p.kaczmarek2 I honestly wouldn't mind lying upside down and drinking a beer in front of the TV, because there's always something to do, something the AI won't do for you. I still think that the so-called "golden hands" should not feel threatened, if a person does not know how to do something, even a manual for a blonde will not help him - it's just that we have different limits in different aspects and you cannot be an ace in every field .

As for me, just so I don't get a nut, those 2-3 days of work a week have to be - and whether I'm cleaning, washing, cooking, cutting tiles or fitting cables or valves - it doesn't really matter.

While in the summer I wouldn't complain about a lifelong L4, because I could go fishing or sightseeing on my motorbike, in the winter there's basically nothing interesting to do - I can't skate, and even if I did, it's more of a two-hour game at the most.

The gym would probably fly in as a filler, because what to do with so much free time?
#34 21535292 30 Apr 2025 12:55

andrzejlisek andrzejlisek

Level 31

» | Helpful post? (0)

Post #34
21535292 30 Apr 2025 12:55

I have Ubuntu 20.04 and wanted to play with it myself. I went for the easy way and puzzled out GPT-4.1, how to install this whole Stable Diffusion thing step by step.

He suggested this project: https://github.com/AUTOMATIC1111/stable-diffusion-webui
This is a different interface to the Stable Diffusion models. Foocus seems to have ripped from it or vice versa. All in all, it doesn't matter, the important thing is that it works.

It also "guided me" to the models:
https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5 - basic model, comes in two variants, one is "emaonly"
https://huggingface.co/stabilityai/stable-diffusion-2-1-base - a newer model, in two variants "nonema" and "ema"
https://huggingface.co/stabilityai/stable-diffusion-2-1 - theoretically better than "base", but in practice does not work for "webui". The image forms, but always presents a patchwork of blobs.
https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE - I tested "Realistic_Vision_V5.1.safetensors" and also think it's worth a look.
https://huggingface.co/RunDiffusion/Juggernaut-XL-v8/tree/main - and of course juggernautXL_v8Rundiffusion.safetensors .
https://huggingface.co/JCTN/fav_models/tree/main - contains identical file "juggernautXL_v8Rundiffusion.safetensors"

I tested the same prompts that p.kaczmarek2 provides and the model juggernautXL_v8Rundiffusion.safetensors produced images in the same style as the ones he inserted.

Repeatability, as much as theoretically possible, is possible, just Seed set to 0 or some positive number, but there is one catch: If you want to generate the same image, but at a higher resolution, it will already come out different.

And here is my favourite test prompt: Naked woman on the beach . All of the Stable Diffusion models gave it a go, with better or worse results. And commercial models like OpenAI's DALL-E 3 lecture. This is the best test of whether a model always produces a better or worse image, or whether it produces when "it feels like it".
ADVERTISEMENT
#35 21535294 30 Apr 2025 12:57

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #35
21535294 30 Apr 2025 12:57

@andrzejlisek And have you tried generating separately and then upscale separately?

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#36 21535303 30 Apr 2025 13:05

andrzejlisek andrzejlisek

Level 31

» | Helpful post? (0)

Post #36
21535303 30 Apr 2025 13:05

p.kaczmarek2 wrote:
@andrzejlisek And have you tried generating separately and then upscale separately?
.
Not yet, but now I always test 512x512 and always the same model and the same parameters, it comes out the same image. Upscale I haven't got to yet, but I'll probably play around. Even when I create a series, after increasing batch size or bach count , it always comes out a series of the same images.

And this model https://huggingface.co/stabilityai/stable-diffusion-2-1 I tested both 512x512 and 768x768, not a single meaningful image emerged.

A good parameter is sampling steps . When downsampled, the image comes out "more or less similar" but more distorted, but much faster. I was looking for a way to "quickly" generate several images and then re-generate only the one I like best. Changing the resolution is not the way to go. What remains is to just upscale. Either generate the series at lower sampling steps , and the final one at higher sampling steps .
#37 21535309 30 Apr 2025 13:11

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #37
21535309 30 Apr 2025 13:11

And have you studied the effect of guidance scale on the images produced? This also has interesting effects.

There's also a second way to improve an image - if you don't want to increase the resolution, but improve a selected section of the image in general, you have an 'Improve detail' option in inpainting. You give the image an input, select what to improve with the brush and it improves the

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#38 21535318 30 Apr 2025 13:19

andrzejlisek andrzejlisek

Level 31

» | Helpful post? (0)

Post #38
21535318 30 Apr 2025 13:19

p.kaczmarek2 wrote:
And have you studied the effect of guidance scale on the images produced? This also has interesting effects.
.
At my place it is under the name CFG scale , but it is probably the same thing. I haven't looked yet, I've only just installed and tested different models and prompts.

p.kaczmarek2 wrote:
There's also a second way to improve the image - if you don't want to increase the resolution, but improve a selected section of the image in general, then in inpainting you have the "Improve detail" option. You give the image an input, select what to improve with the brush and it improves
.
The problem is that it takes a long time to produce an image, especially a series. I have been looking for a way to workflow like this:
1. I create a series of lower quality images, select a prompt, CFG, seed and even a model. Each series is created in a relatively short time.
2) If I don't like any image, I change the parameters, especially the prompt or seed, create a new series.
3) I choose one image that I like the most, I know explicitly how it was created (model, prompt, seed).
4. I create the same image, but in good quality, where it is already normal that I will have to wait a bit.
#39 21535738 30 Apr 2025 23:00

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #39
21535738 30 Apr 2025 23:00

So if you give fewer steps you get a "sketch" of the image, which you can then make in better quality, yes? Interesting.

Also try using the wildcards prompt. You create a text file in the appropriate folder (probably wildcards, you'll find it in Foocus), in it on a separate line separate sets of keywords, and in the prompt you use for example __mojWildcard__ (with underscores). Then each image generation will replace __mojWildcard__ with a selected line from that file (named mojWildcard).
At least more or less like this - to check for yourself. .
This way you can prepare several variations of the prompt at once and let the generation run while you are away from the computer, come back then you will review the results.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
ADVERTISEMENT
#40 21535827 01 May 2025 05:13

andrzejlisek andrzejlisek

Level 31

» | Helpful post? (0)

Post #40
21535827 01 May 2025 05:13

p.kaczmarek2 wrote:
That is, if you give fewer steps you get a "sketch" of the image which you can then make in better quality, yes? Interesting.
.

You could say. The default is 20 and that is a moderately good level. Level 10 is the bare minimum to make the image look more or less the same, same composition, but there will be more randomness. Level 5 will give even more chaoticness, but the general 'artistic concept' agrees. You can also go the other way and give, for example, 50 and the picture will be more coherent realistic, but already slightly better than level 20.

But of course, the condition is all the rest of the parameters identical, including Seed.

Added after 1 [minute]: .

p.kaczmarek2 wrote:
Are you investigating the effect of guidance scale on the images created? This also gives interesting results.
.
I have already tested this and a moderate level, around the default gives good results. Extremely low it gives an image that is nice but very loosely related to the prompt, and very high gives an artificial and ugly image.
#41 21542103 07 May 2025 13:47

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #41
21542103 07 May 2025 13:47

.

Prompt: Create an image in the style of a photograph/photo.
The thing takes place in communist times, an Audi 80 has been stopped by a police polonez.
The Palace of Culture can be seen in the background.
Note the correct painting of the police car and the correct number plates of both cars.
#42 21542132 07 May 2025 14:21

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #42
21542132 07 May 2025 14:21

What kind of generator is it? The latest one from ChatGPT?

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#43 21542134 07 May 2025 14:23

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #43
21542134 07 May 2025 14:23

@p.kaczmarek2 yes, the latest from ChatGPT.

Added after 48 [seconds]: .

Note that despite the logical error - PRL and Police - chatgpt has pulled this off downright exemplary .
#44 21542279 07 May 2025 16:51

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #44
21542279 07 May 2025 16:51

I would prefer such a generator locally.... But since we are already testing chatgpt, here are some samples from me:

Quote:
.
Create a photo/photo style image.
The thing takes place in communist times, a fiat 126p has been stopped by a police polonez.
In the background you can see the Palace of Culture.
Note the correct painting of the police car and the correct number plates of both cars.
.

Create an image in the style of a photograph/photo.
The thing takes place in communist times, the Arduino UNO has been stopped by a police polonez.
In the background you can see the Palace of Culture.
Note the correct painting of the police car and the correct number plates of both cars.
.

Create an image in the style of a photograph/photo.
The thing takes place in communist times, the Arduino UNO has been stopped by a police polonez.
In the background you can see the Palace of Culture.
Note the correct painting of the police car and the correct number plates of both cars.
All the action takes place underwater.
.
.
You can play around. I might also try to test right away whether this generator also understands complex relationships, i.e. if I write it that one object is to be green and another is to be red, for example, will it keep the colour split, or will it be, as it's called, "style bleed" and the style/colour/character will transfer to adjacent things.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#45 21542284 07 May 2025 16:55

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #45
21542284 07 May 2025 16:55

p.kaczmarek2 wrote:
You can play around.
.
You disarmed me with that Arduino - I spat on my monitor, haha.

p.kaczmarek2 wrote:
does this generator also understand complex relations
.
I have an idea for a prompt, but I'll need to make it more specific - it needs to be "debilitating" so that the artificial debilitator - with an apology - can 100% comprehend it .
#46 21542430 07 May 2025 19:44

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #46
21542430 07 May 2025 19:44

I have made an approach to complex relationships well and I must say I am impressed:
Quote:
.
Create a graphic: in the top right corner a green fiat 126P with square wheels, in the bottom left corner an arduino uno, in the top left corner a red flower, in the bottom right corner the word "TEST".
.
.

Added after 2 [minutes]: .

Quote:
.
Swap what is in the bottom right corner with what is in the bottom left corner in places.
.
.
A high level of reasoning I see. Graphic designers can already get their act together....

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#47 21542492 07 May 2025 20:15

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #47
21542492 07 May 2025 20:15

.
My mistake for not emphasising to him to pay attention to the painting of the police car, but there are not many mistakes - this side of the church adjoins the tenements and is not visible that way, the police logo is reversed, but even the painting is not correct. The eagle on the caps also came out poorly for him. But other than that? Perfect graphics for a clickbait thumbnail!
#48 21562421 28 May 2025 10:07

andrzejlisek andrzejlisek

Level 31
» | Helpful post? (0)

Post #48
21562421 28 May 2025 10:07

For the sake of argument, I'll write how to run this on Ubuntu Linux 20.04, which is what I did. And it's fairly straightforward:

1. if you have an NVidia graphics card, a driver update will come in handy:
Code: text Expand Select all Copy to clipboard
sudo apt install nvidia-driver-535
.

2. after a reboot, install Python:
Code: text Expand Select all Copy to clipboard
sudo apt install git python3 python3-venv python3-pip sudo apt install python3.9 python3.9-venv
.

3. installation of the server https://github.com/AUTOMATIC1111/stable-diffusion-webui
Code: text Expand Select all Copy to clipboard
git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git cd stable-diffusion-webui
.

4. copy the Stability Diffusion compatible models into the /stable-diffusion-webui/models/Stable-diffusion subdirectory.
For example https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5.
Models are in *.ckpt or *.safetensors format.

5 Prepare the virtual runtime environment
Code: text Expand Select all Copy to clipboard
python3.10 -m venv venv source venv/bin/activate
.

6. server start-up
Code: text Expand Select all Copy to clipboard
python3.9 launch.py
6.
If there are problems with start-up and operation, you can try using the parameters
Code: text Expand Select all Copy to clipboard
python3.9 launch.py --medvram --lowvram --xformers
.
A variant using the processor instead of the graphics card:
Code: text Expand Select all Copy to clipboard
python3.9 launch.py --skip-torch-cuda-test --no-half --precision full --use-cpu all
.
The console will show the address of the server and at the same time, the browser with this address will start by itself.
#49 21562429 28 May 2025 10:15

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #49
21562429 28 May 2025 10:15

As far as I can see, you have run the AUTOMATIC1111 UI and not Fooocus? And what does it look like, how does it compare to Fooocus? What kind of workflow do you have?

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#50 21562441 28 May 2025 10:29

andrzejlisek andrzejlisek

Level 31

» | Helpful post? (0)

Post #50
21562441 28 May 2025 10:29

Fooocus didn't work for me, and instead of trying to read up on how to configure and run it, I took the easy way out and asked GPT to write out step-by-step, cow-by-cow, what was needed and how to run Stable Diffusion on a Linux system. He suggested AUTOMATIC1111, and indeed, I did well, as my system was not ready on the spur of the moment, hence the additional upgrades and configurations.

The interface itself is as in the screengrab https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/screenshot.png Functionally I think it's almost the same as far as text2img itself is concerned. In my opinion, it makes no difference whether it is AUTOMATIC1111 or Fooocus.

The workflow when configured is:
1. I start the server, wait for the browser with the page to start.
2) At the top, I check out the model (I may have several models downloaded).
3 I set all parameters, seed to 0, note that the server does not "keep" the last used parameters. I leave the other parameters as they are by default.
4. using the "batch size" or "batch count" I can produce several images at once.

In practice, it is sometimes some programming errors. The interface doesn't want to "grab" the model as I try to select from the list. Sometimes I have saved myself by moving unnecessary files to another folder, that there was only one model to choose from.

The computer has 32GB of RAM and 4GB of VRAM. It happens that after a few generations the system slows down a lot, then it annihilates the server and the computer revives again. Simply close the browser and restart the server.

I tested the same prompts as in the first post and on the same juggernautXL_v8Rundiffusion.safetensors model, I was getting images in the same style as posted in the post.
Create an account, log in and become active in a forum and ads will not appear. You will receive points by participating in discussions.
Join this discussion.

Install the application

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

Report a violation of the law

Topic summary

The discussion centers on generating AI-based images locally using Stable Diffusion models through user-friendly web interfaces like Fooocus and AUTOMATIC1111's stable-diffusion-webui. Fooocus offers fully offline image generation with features such as prompt-based creation, GPT2-assisted prompt development, upscaling, inpainting, outpainting, and image variation. Performance depends heavily on GPU VRAM, with advanced models like Stable Diffusion XL requiring 8GB to 24GB VRAM, favoring high-end NVIDIA GPUs (RTX 3090, 4090, 5090) due to Tensor core optimization. Users compare generation times and quality across GPUs (e.g., GTX1060 vs. RTX3070). Challenges include AI limitations in rendering complex elements like hands, wires, subtitles, and technical schematics, with "keyword/style bleed" affecting image consistency. Technical drawing generation remains problematic; however, language models can produce descriptive ASCII schematics and netlists, suggesting future workflows may combine textual planning with graphical output. Seed control enables deterministic image reproduction, and parameters like CFG/guidance scale and sampling steps influence image coherence and randomness. Commercial use of AI-generated images requires caution regarding trademark infringement, with some companies adding AI-generated disclaimers. Installation guidance for Ubuntu 20.04 includes NVIDIA driver updates, Python environment setup, and model downloads from Hugging Face. AUTOMATIC1111's web UI is functionally similar to Fooocus, with workflows involving model selection, parameter tuning, and batch generation followed by upscaling. Recent advances include OpenAI's token-based image generation (GPT-4o), which reasons in pixel space and shows promise for complex, interactive graphics. The AI-driven GPU demand has impacted hardware prices, complicating access for casual users.
Summary generated by the language model.