logo elektroda
logo elektroda
X
logo elektroda

[AI] Graphics and image generator on your own computer - web interface for Stable Diffusion

p.kaczmarek2 3174 49
ADVERTISEMENT
Treść została przetłumaczona polish » english Zobacz oryginalną wersję tematu
📢 Listen (AI):
  • #31 21495707
    gulson
    System Administrator
    They are graphic designers on UBI (universal basic income). All in all, I'm slowly getting ready too.
  • ADVERTISEMENT
  • #32 21496958
    p.kaczmarek2
    Moderator Smart Home
    andrzejlisek wrote:
    Such a general question. Can this whole offline generator work deterministically? Is it possible to configure it in such a way (e.g. by specifying a draw seed) that if I call an image query, I get an image, and if I later call the same query, I get the same image and not a different variant of the image that represents the same thing?

    Normally this uses pseudo-randomisation and every time it is called, the result is different, just as it was intended. Is it possible to control it with this, i.e. to eliminate pseudo-randomisation or to start with the same seed every time?
    .
    There is an option there to set a seed:
    Screenshot of settings interface with an option to set the Seed value. .
    So a fixed seed will give the same image, but you can change something else as a test, such as the weight of the LORA and see how that affects the effect.

    gulson wrote:
    Yesterday they released a real bomb:
    https://openai.com/index/introducing-4o-image-generation/
    Sensational generation of captions, postcards, infographics, icons etc.
    Graphic designers hate them.
    .
    Well, and you can already do a full blown lamp:
    A glass of red wine on a dark background. .


    gulson wrote:
    This is the graphic designers at UBI (universal basic income). In fact, I'm slowly getting ready too.

    Slightly related quote:
    Close-up of a magazine page featuring a quote about AI preferences, discussing AI doing household chores. .
    Helpful post? Buy me a coffee.
  • ADVERTISEMENT
  • #33 21497168
    OPservator
    Level 39  
    @p.kaczmarek2 I honestly wouldn't mind lying upside down and drinking a beer in front of the TV, because there's always something to do, something the AI won't do for you. I still think that the so-called "golden hands" should not feel threatened, if a person does not know how to do something, even a manual for a blonde will not help him - it's just that we have different limits in different aspects and you cannot be an ace in every field :) .

    As for me, just so I don't get a nut, those 2-3 days of work a week have to be - and whether I'm cleaning, washing, cooking, cutting tiles or fitting cables or valves - it doesn't really matter.

    While in the summer I wouldn't complain about a lifelong L4, because I could go fishing or sightseeing on my motorbike, in the winter there's basically nothing interesting to do - I can't skate, and even if I did, it's more of a two-hour game at the most.

    The gym would probably fly in as a filler, because what to do with so much free time?
  • #34 21535292
    andrzejlisek
    Level 31  
    I have Ubuntu 20.04 and wanted to play with it myself. I went for the easy way and puzzled out GPT-4.1, how to install this whole Stable Diffusion thing step by step.

    He suggested this project: https://github.com/AUTOMATIC1111/stable-diffusion-webui
    This is a different interface to the Stable Diffusion models. Foocus seems to have ripped from it or vice versa. All in all, it doesn't matter, the important thing is that it works.

    It also "guided me" to the models:
    https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5 - basic model, comes in two variants, one is "emaonly"
    https://huggingface.co/stabilityai/stable-diffusion-2-1-base - a newer model, in two variants "nonema" and "ema"
    https://huggingface.co/stabilityai/stable-diffusion-2-1 - theoretically better than "base", but in practice does not work for "webui". The image forms, but always presents a patchwork of blobs.
    https://huggingface.co/SG161222/Realistic_Vision_V5.1_noVAE - I tested "Realistic_Vision_V5.1.safetensors" and also think it's worth a look.
    https://huggingface.co/RunDiffusion/Juggernaut-XL-v8/tree/main - and of course juggernautXL_v8Rundiffusion.safetensors .
    https://huggingface.co/JCTN/fav_models/tree/main - contains identical file "juggernautXL_v8Rundiffusion.safetensors"

    I tested the same prompts that p.kaczmarek2 provides and the model juggernautXL_v8Rundiffusion.safetensors produced images in the same style as the ones he inserted.

    Repeatability, as much as theoretically possible, is possible, just Seed set to 0 or some positive number, but there is one catch: If you want to generate the same image, but at a higher resolution, it will already come out different.

    And here is my favourite test prompt: Naked woman on the beach . All of the Stable Diffusion models gave it a go, with better or worse results. And commercial models like OpenAI's DALL-E 3 lecture. This is the best test of whether a model always produces a better or worse image, or whether it produces when "it feels like it".
  • ADVERTISEMENT
  • #35 21535294
    p.kaczmarek2
    Moderator Smart Home
    @andrzejlisek And have you tried generating separately and then upscale separately?
    Helpful post? Buy me a coffee.
  • #36 21535303
    andrzejlisek
    Level 31  
    p.kaczmarek2 wrote:
    @andrzejlisek And have you tried generating separately and then upscale separately?
    .
    Not yet, but now I always test 512x512 and always the same model and the same parameters, it comes out the same image. Upscale I haven't got to yet, but I'll probably play around. Even when I create a series, after increasing batch size or bach count , it always comes out a series of the same images.

    And this model https://huggingface.co/stabilityai/stable-diffusion-2-1 I tested both 512x512 and 768x768, not a single meaningful image emerged.

    A good parameter is sampling steps . When downsampled, the image comes out "more or less similar" but more distorted, but much faster. I was looking for a way to "quickly" generate several images and then re-generate only the one I like best. Changing the resolution is not the way to go. What remains is to just upscale. Either generate the series at lower sampling steps , and the final one at higher sampling steps .
  • #37 21535309
    p.kaczmarek2
    Moderator Smart Home
    And have you studied the effect of guidance scale on the images produced? This also has interesting effects.

    There's also a second way to improve an image - if you don't want to increase the resolution, but improve a selected section of the image in general, you have an 'Improve detail' option in inpainting. You give the image an input, select what to improve with the brush and it improves the
    Helpful post? Buy me a coffee.
  • #38 21535318
    andrzejlisek
    Level 31  
    p.kaczmarek2 wrote:
    And have you studied the effect of guidance scale on the images produced? This also has interesting effects.
    .
    At my place it is under the name CFG scale , but it is probably the same thing. I haven't looked yet, I've only just installed and tested different models and prompts.

    p.kaczmarek2 wrote:
    There's also a second way to improve the image - if you don't want to increase the resolution, but improve a selected section of the image in general, then in inpainting you have the "Improve detail" option. You give the image an input, select what to improve with the brush and it improves
    .
    The problem is that it takes a long time to produce an image, especially a series. I have been looking for a way to workflow like this:
    1. I create a series of lower quality images, select a prompt, CFG, seed and even a model. Each series is created in a relatively short time.
    2) If I don't like any image, I change the parameters, especially the prompt or seed, create a new series.
    3) I choose one image that I like the most, I know explicitly how it was created (model, prompt, seed).
    4. I create the same image, but in good quality, where it is already normal that I will have to wait a bit.
  • #39 21535738
    p.kaczmarek2
    Moderator Smart Home
    So if you give fewer steps you get a "sketch" of the image, which you can then make in better quality, yes? Interesting.

    Also try using the wildcards prompt. You create a text file in the appropriate folder (probably wildcards, you'll find it in Foocus), in it on a separate line separate sets of keywords, and in the prompt you use for example __mojWildcard__ (with underscores). Then each image generation will replace __mojWildcard__ with a selected line from that file (named mojWildcard).
    At least more or less like this - to check for yourself. .
    This way you can prepare several variations of the prompt at once and let the generation run while you are away from the computer, come back then you will review the results.
    Helpful post? Buy me a coffee.
  • ADVERTISEMENT
  • #40 21535827
    andrzejlisek
    Level 31  
    p.kaczmarek2 wrote:
    That is, if you give fewer steps you get a "sketch" of the image which you can then make in better quality, yes? Interesting.
    .

    You could say. The default is 20 and that is a moderately good level. Level 10 is the bare minimum to make the image look more or less the same, same composition, but there will be more randomness. Level 5 will give even more chaoticness, but the general 'artistic concept' agrees. You can also go the other way and give, for example, 50 and the picture will be more coherent realistic, but already slightly better than level 20.

    But of course, the condition is all the rest of the parameters identical, including Seed.

    Added after 1 [minute]: .

    p.kaczmarek2 wrote:
    Are you investigating the effect of guidance scale on the images created? This also gives interesting results.
    .
    I have already tested this and a moderate level, around the default gives good results. Extremely low it gives an image that is nice but very loosely related to the prompt, and very high gives an artificial and ugly image.
  • #41 21542103
    OPservator
    Level 39  
    A Polish Militia Polonez stops a white Audi 80 in front of the Palace of Culture and Science, PRL era. .


    Prompt: Create an image in the style of a photograph/photo.
    The thing takes place in communist times, an Audi 80 has been stopped by a police polonez.
    The Palace of Culture can be seen in the background.
    Note the correct painting of the police car and the correct number plates of both cars.
  • #42 21542132
    p.kaczmarek2
    Moderator Smart Home
    What kind of generator is it? The latest one from ChatGPT?
    Helpful post? Buy me a coffee.
  • #43 21542134
    OPservator
    Level 39  
    @p.kaczmarek2 yes, the latest from ChatGPT.

    Added after 48 [seconds]: .

    Note that despite the logical error - PRL and Police - chatgpt has pulled this off downright exemplary :) .
  • #44 21542279
    p.kaczmarek2
    Moderator Smart Home
    I would prefer such a generator locally.... But since we are already testing chatgpt, here are some samples from me:

    Quote:
    .
    Create a photo/photo style image.
    The thing takes place in communist times, a fiat 126p has been stopped by a police polonez.
    In the background you can see the Palace of Culture.
    Note the correct painting of the police car and the correct number plates of both cars.
    A policeman writes a ticket next to a yellow Fiat 126p and a blue police car, with the Palace of Culture and Science in Warsaw in the background. .

    Create an image in the style of a photograph/photo.
    The thing takes place in communist times, the Arduino UNO has been stopped by a police polonez.
    In the background you can see the Palace of Culture.
    Note the correct painting of the police car and the correct number plates of both cars.
    An Arduino Uno on wheels next to a blue police car in front of the Palace of Culture in Warsaw. .

    Create an image in the style of a photograph/photo.
    The thing takes place in communist times, the Arduino UNO has been stopped by a police polonez.
    In the background you can see the Palace of Culture.
    Note the correct painting of the police car and the correct number plates of both cars.
    All the action takes place underwater.
    Underwater scene with a large Arduino UNO board and an old Polish police car in front of Warsaw’s Palace of Culture and Science. .
    .
    You can play around. I might also try to test right away whether this generator also understands complex relationships, i.e. if I write it that one object is to be green and another is to be red, for example, will it keep the colour split, or will it be, as it's called, "style bleed" and the style/colour/character will transfer to adjacent things.
    Helpful post? Buy me a coffee.
  • #45 21542284
    OPservator
    Level 39  
    p.kaczmarek2 wrote:
    You can play around.
    .
    You disarmed me with that Arduino - I spat on my monitor, haha.

    p.kaczmarek2 wrote:
    does this generator also understand complex relations
    .
    I have an idea for a prompt, but I'll need to make it more specific - it needs to be "debilitating" so that the artificial debilitator - with an apology - can 100% comprehend it :D .
  • #46 21542430
    p.kaczmarek2
    Moderator Smart Home
    I have made an approach to complex relationships well and I must say I am impressed:
    Quote:
    .
    Create a graphic: in the top right corner a green fiat 126P with square wheels, in the bottom left corner an arduino uno, in the top left corner a red flower, in the bottom right corner the word "TEST".
    Illustration showing a red flower, green car, Arduino Uno board, and the word TEST on a light background. .
    .

    Added after 2 [minutes]: .

    Quote:
    .
    Swap what is in the bottom right corner with what is in the bottom left corner in places.
    An illustration shows a red flower, a green car, an Arduino UNO board, and the word TEST on a light background. .
    .
    A high level of reasoning I see. Graphic designers can already get their act together....
    Helpful post? Buy me a coffee.
  • #47 21542492
    OPservator
    Level 39  
    Four police officers detain two bald men on the hood of a police car on a deserted city street. .
    My mistake for not emphasising to him to pay attention to the painting of the police car, but there are not many mistakes - this side of the church adjoins the tenements and is not visible that way, the police logo is reversed, but even the painting is not correct. The eagle on the caps also came out poorly for him. But other than that? Perfect graphics for a clickbait thumbnail!
  • #48 21562421
    andrzejlisek
    Level 31  
    For the sake of argument, I'll write how to run this on Ubuntu Linux 20.04, which is what I did. And it's fairly straightforward:

    1. if you have an NVidia graphics card, a driver update will come in handy:
    sudo apt install nvidia-driver-535
    .

    2. after a reboot, install Python:
    sudo apt install git python3 python3-venv python3-pip
    sudo apt install python3.9 python3.9-venv
    .

    3. installation of the server https://github.com/AUTOMATIC1111/stable-diffusion-webui
    git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui.git
    cd stable-diffusion-webui
    .

    4. copy the Stability Diffusion compatible models into the /stable-diffusion-webui/models/Stable-diffusion subdirectory.
    For example https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5.
    Models are in *.ckpt or *.safetensors format.

    5 Prepare the virtual runtime environment
    python3.10 -m venv venv
    source venv/bin/activate
    .

    6. server start-up
    python3.9 launch.py
    6.
    If there are problems with start-up and operation, you can try using the parameters
    python3.9 launch.py --medvram --lowvram --xformers
    .
    A variant using the processor instead of the graphics card:
    python3.9 launch.py --skip-torch-cuda-test --no-half --precision full --use-cpu all
    .
    The console will show the address of the server and at the same time, the browser with this address will start by itself.
  • #49 21562429
    p.kaczmarek2
    Moderator Smart Home
    As far as I can see, you have run the AUTOMATIC1111 UI and not Fooocus? And what does it look like, how does it compare to Fooocus? What kind of workflow do you have?
    Helpful post? Buy me a coffee.
  • #50 21562441
    andrzejlisek
    Level 31  
    Fooocus didn't work for me, and instead of trying to read up on how to configure and run it, I took the easy way out and asked GPT to write out step-by-step, cow-by-cow, what was needed and how to run Stable Diffusion on a Linux system. He suggested AUTOMATIC1111, and indeed, I did well, as my system was not ready on the spur of the moment, hence the additional upgrades and configurations.

    The interface itself is as in the screengrab https://github.com/AUTOMATIC1111/stable-diffusion-webui/blob/master/screenshot.png Functionally I think it's almost the same as far as text2img itself is concerned. In my opinion, it makes no difference whether it is AUTOMATIC1111 or Fooocus.

    The workflow when configured is:
    1. I start the server, wait for the browser with the page to start.
    2) At the top, I check out the model (I may have several models downloaded).
    3 I set all parameters, seed to 0, note that the server does not "keep" the last used parameters. I leave the other parameters as they are by default.
    4. using the "batch size" or "batch count" I can produce several images at once.

    In practice, it is sometimes some programming errors. The interface doesn't want to "grab" the model as I try to select from the list. Sometimes I have saved myself by moving unnecessary files to another folder, that there was only one model to choose from.

    The computer has 32GB of RAM and 4GB of VRAM. It happens that after a few generations the system slows down a lot, then it annihilates the server and the computer revives again. Simply close the browser and restart the server.

    I tested the same prompts as in the first post and on the same juggernautXL_v8Rundiffusion.safetensors model, I was getting images in the same style as posted in the post.
📢 Listen (AI):

Topic summary

The discussion centers on generating AI-based images locally using Stable Diffusion models through user-friendly web interfaces like Fooocus and AUTOMATIC1111's stable-diffusion-webui. Fooocus offers fully offline image generation with features such as prompt-based creation, GPT2-assisted prompt development, upscaling, inpainting, outpainting, and image variation. Performance depends heavily on GPU VRAM, with advanced models like Stable Diffusion XL requiring 8GB to 24GB VRAM, favoring high-end NVIDIA GPUs (RTX 3090, 4090, 5090) due to Tensor core optimization. Users compare generation times and quality across GPUs (e.g., GTX1060 vs. RTX3070). Challenges include AI limitations in rendering complex elements like hands, wires, subtitles, and technical schematics, with "keyword/style bleed" affecting image consistency. Technical drawing generation remains problematic; however, language models can produce descriptive ASCII schematics and netlists, suggesting future workflows may combine textual planning with graphical output. Seed control enables deterministic image reproduction, and parameters like CFG/guidance scale and sampling steps influence image coherence and randomness. Commercial use of AI-generated images requires caution regarding trademark infringement, with some companies adding AI-generated disclaimers. Installation guidance for Ubuntu 20.04 includes NVIDIA driver updates, Python environment setup, and model downloads from Hugging Face. AUTOMATIC1111's web UI is functionally similar to Fooocus, with workflows involving model selection, parameter tuning, and batch generation followed by upscaling. Recent advances include OpenAI's token-based image generation (GPT-4o), which reasons in pixel space and shows promise for complex, interactive graphics. The AI-driven GPU demand has impacted hardware prices, complicating access for casual users.
Summary generated by the language model.
ADVERTISEMENT