[AI] Graphics and image generator on your own computer - web interface for Stable Diffusion

p.kaczmarek2 4398 49

Treść została przetłumaczona

Zobacz oryginalną wersję tematu

Report a violation of the law

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

» | Topic author Helpful post? (+6)

Post #1
21484495 17 Mar 2025 22:44

.
How do I generate images on my own computer? How to change the content of a picture? Is AI able to suggest an interface sketch for a smart home controller? This is what I will try to find out here, all thanks to an easy-to-use web interface based on the Stable Diffusion XL architecture.

Presented here Fooocus is a completely free, open and 100% local (no internet, on your computer) environment for creating images using AI models based on Stable Diffusion. Fooocus offers, among other things:
- image generation based on prompts
- GPT2-based prompt development
- image upscaling, i.e. increasing the resolution
- outpainting, i.e. adding parts of an image outside its frame
- inpainting, i.e. editing part of an image
- variation, i.e. creating different versions of the image
- image prompt
- describe image, i.e. describing the image to the prompt of the selected model
- negative prompt, i.e. what you don't want in the image
- the possibility of integrating LORA add-ons, styles and setting their weights, which gives us greater control over what we generate

Install and run .
Download the package from the repository , extract, run.bat:
.
In newer versions there will also be separate scripts run_realistic.bat and run_anime.bat, they work in the same way - just select the style of graphics at the start.
After a while in the browser our Fooocus page will open:
.
Time to start the fun.

First generations .
I must warn at the outset that the graphics take quite a long time to generate. On mine on an Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 64GB RAM, GeForce GTX 1060 they take about three minutes to generate.
We simply type in the prompt and select the options. Sometimes you have to scramble to get the desired effect. In addition, the keywords used affect the style of the whole image, so it's hard to make one 'series' of graphics. But let's give it a go, first a sample prompt. All generated by my laptop.
Model used: juggernautXL_v8Rundiffusion.safetensors
Code: text Expand Select all Copy to clipboard
A cozy cabin in a snowy forest at sunset, warm light glowing from the windows, smoke rising from the chimney
.
.
Pretty good, but why is the chimney on fire?

Code: text Expand Select all Copy to clipboard
Futuristic city skyline at night, neon lights reflecting on wet streets, flying cars in the sky
.
.
Colour scheme great, but the sense of vehicles and backgrounds is quickly lost.
.
Code: text Expand Select all Copy to clipboard
Astronaut floating peacefully above Earth, with stars and galaxies in the background, contemplative mood
.
.
Strange details again.
.
Code: text Expand Select all Copy to clipboard
A cat café with cozy decor, wooden furniture, and playful cats lounging around while people sip coffee
.
.
Here you can see the artefacts much more strongly, you often end up like this trying to generate a scene with many characters.
.
Code: text Expand Select all Copy to clipboard
A cute fox wearing a small backpack, wandering through an autumn forest filled with orange and red leaves

.
Quite good, just always a slightly different character.
.
Code: text Expand Select all Copy to clipboard
A massive floating island with waterfalls cascading into the clouds, bioluminescent plants glowing softly
.
.
.
Code: text Expand Select all Copy to clipboard
A colossal mechanical dragon flying over a steampunk city, gears and smoke in the background
.
.

Electronics through the eyes of AI .
Just out of curiosity - does AI know what an Arduino looks like?
Code: text Expand Select all Copy to clipboard
arduino, electronics, table

.
Something must have been in the teaching data, but still the heresy comes out....
And the diagrams?
Code: text Expand Select all Copy to clipboard
schematic, full bridge rectifier, graetz, electronics

.
I wasn't expecting much, but this font is cool too. Still, this
"RECCTITICIFER".
The next generation has done... a picture of the element:
.
How about AI generating us retro hardware?
Code: text Expand Select all Copy to clipboard
unitra, radio, retro, table, receiver

.
Here's maybe another weird experiment....
Code: text Expand Select all Copy to clipboard
polish car, fiat 126p, retro, polish city, street

Inpainting .
Inpainting allows us to modify or completely change part of the image content. For example, we can add a travelling companion to our fox. To use inpainting, turn on the "input image" option and drag the image we want to edit there. We then select the zones we are going to edit.
.
The prompt I gave was similar to the one before, only now it included the word "owl".
The first two results:
.

Trials of practical application .
Finally, I tried to use this generator to create sketches/proposals of various kinds of devices for a "smart home", for example maybe some kind of heating controller, weather station or some other type of touch panel there. I know that the text of these models does not generate readable text, but maybe some idea will give me?
Code: text Expand Select all Copy to clipboard
smart home, ui, design, chart, heater, interface, screenshot
.
.
From a distance it still resembles something, the colour scheme is probably ok too, but the text and details are completely wrongly drawn.

Summary .
To was a short presentation showing what images could easily be generated on typical consumer hardware. It didn't turn out to be that difficult at all and the results are pretty good too, although I'm sure opinions will be divided. Out of curiosity I also tried to generate slightly more practical things, but from this angle the model used limps along so I wouldn't expect much from it.
I'll leave the final verdict to you, only here I'll emphasise that all generation is 100% local, as well as highly customisable and modifiable, so it might be worth taking an interest in, if only out of privacy concerns.
Do you use image generators and if so, what for?

Cool? Ranking DIY
Helpful post? Buy me a coffee.
About Author
p.kaczmarek2 p.kaczmarek2

Moderator Smart Home
Offline

Joined: 26 Dec 2014

Posts: 13147

Help: 605

Posts rating: 10938

Points: 126473
p.kaczmarek2 wrote 13147 posts with rating 10938, helped 605 times. Been with us since 2014 year.
ADVERTISEMENT
#2 21484579 18 Mar 2025 00:22

VIGOR_PICTURES VIGOR_PICTURES

Level 21

» | Helpful post? (0)

Post #2
21484579 18 Mar 2025 00:22

Cool stuff. I'm going to test it on my "stationary" tomorrow. Especially interesting are these additional options in addition to the usual image generation. Maybe you will get used to it and make your life easier .
#3 21484664 18 Mar 2025 07:37

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #3
21484664 18 Mar 2025 07:37

You certainly need to be patient, as generating good graphics can take a little more trying than with ready-made, closed generators that are not available for download.

It is also a good idea to disable styles and "prompt expansion" to start with.

I wonder how good the generators will be in 5-10 years?

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
ADVERTISEMENT
#4 21484758 18 Mar 2025 09:39

speedy9 speedy9

Helpful for users

» | Helpful post? (0)

Post #4
21484758 18 Mar 2025 09:39

I like the owl-shaped backpack the best (something went wrong here ) and that cat:

There are things that generators can't cope with. Subtitles, that's a common one. They still can't cope with generating train tracks, for example, especially when they somehow connect. Their layout makes no sense. Below is the Bing image generator used.
Such an example from the prompt "generate an image of a rural narrow gauge railway station in Greater Poland in the 1920s in an autumn setting. With a train waiting to load grain".
.
Two more images from the prompt: "generate an image of a rural narrow gauge railway station in Greater Poland (lowlands) in the 1920s in an autumn scene. Add a train waiting to load grain and a passenger train waiting on a siding for passengers."
.
Wagons also like to stand not necessarily on rails

P.S. from that chimney in the generated house at the beginning of this topic it smokes like a steam engine or some kind of steelworks

p.kaczmarek2 wrote:
I wonder how good the generators will be in 5-10 years?
.
They will certainly be better. Hopefully they will get rid of some of the artificiality in the generated graphics that you just see.
#5 21484781 18 Mar 2025 09:54

VIGOR_PICTURES VIGOR_PICTURES

Level 21

» | Helpful post? (+1)

Post #5
21484781 18 Mar 2025 09:54

p.kaczmarek2 wrote:
You definitely need to be patient
.
A while ago I installed myself fooocus with the default models (it actually installed itself).
I was most curious to see what the difference in generation time would be between the laptop GTX1060 card and the RTX3070 Supreme. It took me 37 seconds to generate a house in a winter setting.

.

Well, nice. I've just discovered that you can go into the advanced settings and select a preset for yourself regarding model selection and generation speed. Now instead of sitting at work I am downloading weird stuff and wondering why I need all this .
#6 21484794 18 Mar 2025 10:09

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (+1)

Post #6
21484794 18 Mar 2025 10:09

speedy9 wrote:
.
There are things that generators can't cope with. Subtitles, that's a common one. They still can't cope with generating train tracks, for example, especially when they somehow connect. Their layout makes no sense
.
The character's hands (fingers) are a typical example, and as I tested, the wires cause problems too. The more complex the concept the worse, but it also depends what was in the teaching data.

I am most discouraged by the so-called "keyword/style bleed", when giving one keyword affects the whole image.

And of interest, try for example generating a wine glass filled to the brim.
.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#7 21484817 18 Mar 2025 10:24

speedy9 speedy9

Helpful for users

» | Helpful post? (+1)

Post #7
21484817 18 Mar 2025 10:24

p.kaczmarek2 wrote:
And of interest, try, for example, to generate a wine glass filled to the brim.

Lol, good one!
Neither Bing nor ChatGPT actually can (both are the DALL E model). Even after pointing out the error and asking to raise the wine level by 100%,. The glass is always half full.
#8 21484854 18 Mar 2025 10:53

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #8
21484854 18 Mar 2025 10:53

Apparently ChatGPT 4o has the ability to generate simple templates for 3D printing, has anyone checked how far this stands out or is suitable? Let's give it a simple phone stand....
#9 21484866 18 Mar 2025 11:02

speedy9 speedy9

Helpful for users

» | Helpful post? (0)

Post #9
21484866 18 Mar 2025 11:02

There are some such generators, but I haven't tested them, I don't have a printer E.g. meshy.ai
#10 21484934 18 Mar 2025 11:45

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (+2)

Post #10
21484934 18 Mar 2025 11:45

speedy9 wrote:
.
Neither Bing nor ChatGPT actually can (both are DALL E model). Even after pointing out the error and asking to raise the wine level by 100%,. The glass is always half-filled.

This illustrates how influential the learning data is. By manipulating the learning data, you can strongly influence the performance of the model. It's not so much 'real' intelligence as some people think, it's more of a mathematical model based on statistics....

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#11 21484939 18 Mar 2025 11:53

speedy9 speedy9

Helpful for users

» | Helpful post? (0)

Post #11
21484939 18 Mar 2025 11:53

Unfortunately you are right, and this is the biggest threat from any AI. If people start to believe uncritically what AI generates, whether in terms of information or images, it is a very simple way to manipulate society. And AI hallucinates, makes up facts, even if it has access to correct data.
ADVERTISEMENT
#12 21485387 18 Mar 2025 17:40

andrzejlisek andrzejlisek

Level 31

» | Helpful post? (0)

Post #12
21485387 18 Mar 2025 17:40

In my opinion, a good test of these generators is requests for technical and application graphics, e.g. "spectrum of a high-pressure sodium lamp", "phase diagram of water", "inside of a quartz resonator", which can be easily verified. The generator from OpenAI at this point is a doddle. It will lubricate something there, but it does not reflect the truth and is not useful information. And paradoxically, if you ask a textual model, such as GPT-4o or Claude-3.5-Sonnet, it will correctly and truthfully represent the most important characteristic elements. You can also ask for the SVG graphics code and you will get it, and the graphics will be more useful.

I've never tested Sable Diffusion, but as you can see, it does very well at producing decorative, fantasy graphics that can be used to decorate web articles, make a postcard or calendar, or framed to hang on a wall. It remains to be seen how it copes with applied graphics. Although there is already such a test, which is an attempt to depict the Fiat 126p. He has painted a car that resembles, but is nevertheless different from, a real Fiat 126p.
#13 21485812 18 Mar 2025 22:06

gulson gulson

System Administrator

» | Helpful post? (0)

Post #13
21485812 18 Mar 2025 22:06

There is a lack of a model for technical drawings, let alone diagrams. There's quite a niche here, it will be interesting to see if they ever get around to it.
Although it won't be easy, because how do you create training data from schematic diagrams? Since they are not described in words?
You would probably have to pull in every datasheet of the chip.
#14 21486147 19 Mar 2025 09:07

OPservator OPservator

Level 39

» | Helpful post? (+1)

Post #14
21486147 19 Mar 2025 09:07

If the AI were fed with the diagrams described, it might understand something. At the moment, based on a photo, it is able to match a part to a motorbike and even determine the brand, although it does not perceive, for example, the difference in the length of firing pins between vintages, but this data can be supplemented on its own and hits the spot. It should be noted that up to a year ago, they did not analyse images, and if any had this function, it was not sufficiently developed/integrated into their knowledge.

The mere fact of being able to search the internet is already a considerable step forward, but it does not prevent one from perusing google until the 20th page .
#15 21486226 19 Mar 2025 10:24

Belialek Belialek

Level 22

» | Helpful post? (0)

Post #15
21486226 19 Mar 2025 10:24

Playing AI on a graphics card with VRAM=6GB had to end up like this On this hardware you run SD 1.4 or 1.5 - i.e. relatively "poorly educated" models, from 2-3 years ago. SDXL 1.0 already needs 8GB VRAM and the latest SD 3.0/3.5 16GB VRAM (optimally 24GB).

In any kind of AI model, it is the amount of VRAM that matters most, because it determines which 'knowledge' resources the model will use. Stable Diffusion 1.5 is an 8m model (800 million parameters), where SD 3.5 is already an 8B model (8 billion). And this is where the stumbling blocks start - cards with this amount of VRAM are high-end products, so you are practically limited to RTX 3090, 4090, 5090 models - prices are unfortunately prohibitive if you just want to play around. The alternative is to use two cheaper units (the VRAM adds up). And unfortunately only nVidia cards are involved - most of the new models are designed exclusively for Tensor cores. This is a pity, because a Radeon 7900XTX with 24GB VRAM can be bought for just under 5,000zl.

gulson wrote:
There is no model for technical drawings, let alone schematics. There's quite a niche here, it will be interesting to see if they ever get around to it.
Although it won't be easy, because how do you create training data from schematic diagrams? Since they are not described in words?
You would probably have to pull in every single datasheet of the chip.

Synopsys.ai

Of such more down-to-earth, hobbyist systems, there is an option in Autodesk Fusion 360 in the PCB module to assist AI in schematic/PCB creation - how advanced I don't know, as I have the basic version .
#16 21486237 19 Mar 2025 10:30

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #16
21486237 19 Mar 2025 10:30

The demand for AI has definitely had a negative impact on GPU prices, which of course is classically lost on gamers.
#17 21486273 19 Mar 2025 10:54

Belialek Belialek

Level 22

» | Helpful post? (0)

Post #17
21486273 19 Mar 2025 10:54

OPservator wrote:
Demand for AI has definitely had a negative impact on GPU prices, which of course classically loses out to gamers.

This is influenced by many factors, AI being just (or as much as) one of them. I personally have been unlucky, due to a crash I had to buy graphics in 2022. (peak of cryptocurrency digging + covid) and instead of the recommended £1,600 for an RTX3060 I had to spend more than 2x as much, because the cards simply weren't on the market. Now I had to buy another PC - and we have a war, a scramble with tariffs and a boom in AI For a mid-range set (9800X3D + 5070ti) you have to count 9-10 thousand.... Overpricing in the PC segment and hardware shortages in shops worldwide is a good topic for another topic
#18 21486278 19 Mar 2025 10:58

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #18
21486278 19 Mar 2025 10:58

I for one regret not selling my RTX 3070 in the peak as used ones were going for 6-7k, I bought it literally before the price spike as it was December 2019 .
If I sold it then, I'd probably have a 4080 now.

Added after 2 [minutes]:

Belialek wrote:
Pc segment price tag
.
My technical school teacher, a networker, always used to say that the scrap metal in IT was some of the most expensive scrap metal in the world.
He used to say this just after he persuaded the school to buy a big-ass rack to hold his private crypto diggers, for which he got fired 3 years later - I think it paid him back anyway .
#19 21486294 19 Mar 2025 11:06

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (0)

Post #19
21486294 19 Mar 2025 11:06

I would rather not expect a good AI model for technical drawings/schematics any time soon. Text is easier to generate, because it is enough to predict successive tokens, but a technical drawing? A schematic, in the form of a bitmap? After all, as far as I know, Stable Diffusion etc. doesn't 'understand' what it is drawing, it just performs successive 'de-summing' steps according to the patterns it has learned.... I would sooner suspect that there will be a system that somehow "textually" plans the schematic first and then draws it. I don't very much see the direct generation of a technical drawing in the form of an image.

Well, but we will see over time....

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#20 21486324 19 Mar 2025 11:30

Belialek Belialek

Level 22

» | Helpful post? (+2)

Post #20
21486324 19 Mar 2025 11:30

p.kaczmarek2 wrote:
I would rather not expect a good AI model for technical drawings/schematics any time soon. Text is easier to generate, because it is enough to predict successive tokens, but a technical drawing? A schematic, in the form of a bitmap? After all, as far as I know, Stable Diffusion etc. doesn't 'understand' what it is drawing, it just performs successive 'de-summing' steps according to the patterns it has learned.... I would sooner suspect that there will be a system that somehow "textually" plans the schematic first and then draws it. I don't very much see the direct generation of a technical drawing in the form of an image.

Well, but we will see over time...
.

Not to look far - model o1 for my prompt: "Generate a schematic of an electronic circuit that, when connected to 230V AC power, will blink a red LED in a system of 2 seconds on, 1 second off. To this, every 3 blinking cycles the SSR-40DA relay will be switched on, and after another 3 cycles it will be switched off. The whole thing is to be done on discrete components (no programmable circuits)."

He threw out such a descriptive statement:
Spoiler:
Summary of operation

Low-voltage power supply
Since the circuit needs power to operate the 555 timer, counter and LED, it is necessary to provide a stable direct current (DC) voltage in the range of, for example, 5-12 V.
The simplest way to do this is to use a small transformer (e.g. 230 V → 12 V), followed by a rectifier bridge, a filter capacitor and possibly a linear stabiliser (e.g. 7812 for 12 V).
Alternatively, a transformerless power supply (with a series capacitor and appropriate current limitation) can be used, but for safety and reliability reasons, a transformer is usually a better option for such circuits (especially when SSRs need to be powered).

Astable generator on NE555
The NE555 (or another classic timer, such as the LM555) operates in astable mode, generating a rectangular waveform with a total period of 3 s.
The requirement is for the 'high state' to last 2 s and the 'low state' to last 1 s.
Typical NE555 astable mode timing formulas:
Twysoki=0.693 (R1+R2) CiTniski=0.693 (R2) C
Twysoki=0.693(R1+R2)CiTniski=0.693(R2)C With the appropriate selection of resistors R1R1, R2R2 and capacitor CC, we obtain Twysoki≈2 sTwysoki≈2s and Tniski≈1 sTniski≈1s.
Output (pin 3) from NE555:
Controls the LED (via a limiting resistor). When the output is in high state - the LED lights up (2 s), when low - it goes out (1 s).
Serves as a clock signal (clock) for the counter counting consecutive cycles.

Cycle counter - e.g. CD4017 (decimal point chip, so-called decade)
Although the CD4017 is an integrated circuit, it is neither a microcontroller nor a programmable circuit - it is a classic Johnson binary counter with a decade output. It can therefore be considered a 'discrete circuit' in the sense that it is a standard CMOS chip for counting pulses.
The counter counts consecutive clock edges from the NE555 (every 3 s). There are then successive activations of the outputs Q0, Q1, Q2, ..., Q9.
We need a 6-step cycle:
3 steps (e.g. Q0, Q1, Q2) = SSR on
3 steps (Q3, Q4, Q5) = SSR off
Then the cycle is to repeat itself.
It is therefore sufficient to:
Reset the counter as soon as there is a state on the Q6 output. In the CD4017, the RESET pin is applied via a resistor/directly to Q6. This means that the counter counts Q0, Q1, Q2, Q3, Q4, Q5, and when it wants to go to Q6, there is an immediate reset to Q0.
To control the SSR we take OR (or simple diode summing) from the outputs Q0, Q1, Q2 → this will be the "on" state.
When the counter is on Q3, Q4, Q5 - the output for the SSR will be in the low state (no control signal), so the SSR will be off.

SSR-40DA diode control
The SSR-40DA normally accepts a control voltage in the range of 3-32 V DC. It can therefore be directly connected to the output of the CD4017 counter (with OR from Q0, Q1, Q2), although an intermediate transistor is usually recommended.
The circuit can look like this:
"SSR_ON" signal (from diode OR of Q0, Q1, Q2) → base of NPN transistor (e.g. BC547) via ~10 kΩ resistor.
Emitter to ground.
Collector - output to SSR (plus, of course, a pull-up resistor to +12 V and a protection diode if needed).
This arrangement will facilitate stable driving of the SSR input.
Take the SSR power input from the same +12 V (or other low voltage) line, common ground with the 555 circuit and CD4017.

SSR load on the 230 V side
The SSR-40DA output controls the 230 V AC mains voltage.
The load (e.g. motor, heater, lighting, etc.) is connected in series with the SSR output and the 230 V supply line.
All safety rules must be observed, of course - the SSR will have one side to the line and the other to the load (or vice versa). The (low-voltage) control system must be completely isolated or at least powered by a safety transformer.

And this schematic (admittedly in ASCII, but the schematic is a schematic )

Including description
Spoiler:
Explanation of main blocks

Power supply (left part of diagram)
The 230 V AC voltage is stepped down by a transformer to ~12 V AC, then rectified in a bridge, capacitor filtered and (optionally) stabilised (e.g. 7812) to +12 V DC.
This +12 V line supplies the ICs (NE555, CD4017), the LED with resistor and the SSR control transistor circuit.

NE555 generator (block A)
Operates in astable mode, generating a rectangular signal with a period of 3 s (2 s high, 1 s low).
The output (pin 3) simultaneously:
flashes an LED (via a series resistor to +12 V or to ground - depending on configuration),
gives a clock signal (clock) to the CD4017 input.

CD4017 counter (block B)
Counts the edges of the signal from NE555 (every 3 s).
The outputs Q0, Q1, Q2 (consecutive states for 3×3 s = 9 s) are summed by the "OR" diodes, indicating the "SSR ON" state.
Outputs Q3, Q4, Q5 - no signal ("SSR OFF").
The output Q6 connected to RESET immediately restarts the counter to Q0 as soon as the counter would like to go to state Q6. Effect: 6-cycle sequence (Q0→Q1→Q2→Q3→Q4→Q5→(reset)→Q0...).

SSR control transistor
The summed signal from Q0, Q1, Q2 (via OR) passes through the base resistor to the base of the NPN transistor.
When the signal is high (during Q0/Q1/Q2), the transistor saturates, applying voltage to the input of the SSR (e.g. SSR-40DA).
The SSR with the load on its 230 V AC output side will then pass current (ON).
When Q3/Q4/Q5, the signal is low, the transistor is off, the SSR gets no control current → load cut off (OFF).

It also generated me a simplified netlist for KiCad, but I have no way of checking. It's just a blinking diode and a diagram in ASCII, but remember what AI-generated graphics looked like just 2 years ago.

And this is "only" the o1 model, I don't have access to the o1 pro unfortunately.

EDIT: I'm no expert, but I can see that what he suggested nicely in the description, he "forgot" to add on the schematic (at least R1 and R2 for NE555) I got the whole answer in one go, so probably if I had a chat with him he would have improved.

Generating the schematic itself in graphical format is indeed cumbersome, but it's enough to create text files in the format used by well-known circuit design programs (Eagle, KiCad) to create schematics - an even better option, because once the file is imported, not only do we have a graphical representation, it's also easy to edit.
ADVERTISEMENT
#21 21486421 19 Mar 2025 12:43

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (+1)

Post #21
21486421 19 Mar 2025 12:43

Thanks for the tests. So as I wrote - sooner a language model based on tokens will draw this than a model operating on pixels.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#22 21486423 19 Mar 2025 12:46

OPservator OPservator

Level 39

» | Helpful post? (+1)

Post #22
21486423 19 Mar 2025 12:46

ChatGPT 4o let me know that it would take a while to generate the schematic and he would write to me when he had finished.
After 15 minutes I got a message saying that he was unable to produce the schematic, but could describe it to me.
#23 21486428 19 Mar 2025 12:52

Belialek Belialek

Level 22

» | Helpful post? (0)

Post #23
21486428 19 Mar 2025 12:52

p.kaczmarek2 wrote:
Thanks for the tests. So as I wrote - sooner a token-based language model will draw this than a pixel-based model.

Not necessarily - the algorithms themselves try to leapfrog their limitations. Very often I get code in python in response that, when run, does exactly what I asked for, which ChatGPT was explicitly unable to generate for me. Nice diagrams in Eagle or other CAD, are actually text files that the application interprets and displays graphically. I suspect that with a bit of willpower, good promtions of fun I could work out a workflow where I would get code to run which would ultimately generate a schematic in raster graphics format.
#24 21486461 19 Mar 2025 13:18

perfi perfi

Level 14

» | Helpful post? (0)

Post #24
21486461 19 Mar 2025 13:18

And does anyone know how it is with the commercial use of graphics generated by such models? What happens if I use a forest generated by such a model (e.g. what is shown at the beginning) in my game or decide to build a display for heating as in the generated images?
#25 21486463 19 Mar 2025 13:21

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #25
21486463 19 Mar 2025 13:21

Quote:
The consensus so far is this: The USCO (United States Copyright Office) takes the position that "artificial intelligence does not have the human authorship needed to establish its copyright". This means that authorship and ownership of works created by AI must be attributed to the users creating the queries.

Samsung got out of this by adding a caption that the image was generated by artificial intelligence.

Added after 59 [seconds]:

Quote:
you can sell your rights to the materials you produce, incorporate them into works such as books, websites and presentations, and commercialise them in other ways." With that said, the DALL-E team encourages users to disclose the contribution of AI to their projects.

Added after 1 [minute]:

The main thing is that the AI image should not contain trademark infringements - i.e. you cannot, for example, sell AI graphics depicting, for example, a model of your car signed with the HotWheels logo.
#26 21495540 26 Mar 2025 07:20

andrzejlisek andrzejlisek

Level 31

» | Helpful post? (0)

Post #26
21495540 26 Mar 2025 07:20

Such a general question. Can this whole offline generator work deterministically? Is it possible to configure it in such a way (e.g. by specifying a draw seed) that if I call up an image query, I get an image, and if I later call up the same query, I get the same image and not another variant of the image that represents the same thing?

Normally this uses pseudo-randomisation and every time it is called, the result is different, just as it was intended. Is it possible to control it with this, i.e. eliminate the pseudo-randomisation or start with the same seed each time?
#27 21495667 26 Mar 2025 09:10

gulson gulson

System Administrator

» | Helpful post? (0)

Post #27
21495667 26 Mar 2025 09:10

Yesterday they released a real bombshell:
https://openai.com/index/introducing-4o-image-generation/
Sensational generation of captions, postcards, infographics, icons etc.
Graphic designers hate them.

"What's important about this new kind of image generation, which happens with tokens rather than diffusion, is that it effectively reasons in pixel space.
Example: Ask it to draw a notebook with a blank circle-and-cross grid, and then tell it to make the first move, and then you make the move, and so on."
#28 21495680 26 Mar 2025 09:20

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #28
21495680 26 Mar 2025 09:20

.

Further messing around a bit, I see a mix of Łódź and Warsaw here, but it's not too bad.
#29 21495682 26 Mar 2025 09:22

gulson gulson

System Administrator

» | Helpful post? (0)

Post #29
21495682 26 Mar 2025 09:22

Are you sure this is the latest model released yesterday? Because it looks like the old Dall-E
#30 21495705 26 Mar 2025 09:33

OPservator OPservator

Level 39

» | Helpful post? (0)

Post #30
21495705 26 Mar 2025 09:33

Fact, something didn't crank for me.
.
I'll admit it came out mega
Create an account, log in here. You will receive points by participating in discussions.
Join this discussion.

Install Elektroda application

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

Report a violation of the law

Topic summary

The discussion centers on generating AI-based images locally using Stable Diffusion models through user-friendly web interfaces like Fooocus and AUTOMATIC1111's stable-diffusion-webui. Fooocus offers fully offline image generation with features such as prompt-based creation, GPT2-assisted prompt development, upscaling, inpainting, outpainting, and image variation. Performance depends heavily on GPU VRAM, with advanced models like Stable Diffusion XL requiring 8GB to 24GB VRAM, favoring high-end NVIDIA GPUs (RTX 3090, 4090, 5090) due to Tensor core optimization. Users compare generation times and quality across GPUs (e.g., GTX1060 vs. RTX3070). Challenges include AI limitations in rendering complex elements like hands, wires, subtitles, and technical schematics, with "keyword/style bleed" affecting image consistency. Technical drawing generation remains problematic; however, language models can produce descriptive ASCII schematics and netlists, suggesting future workflows may combine textual planning with graphical output. Seed control enables deterministic image reproduction, and parameters like CFG/guidance scale and sampling steps influence image coherence and randomness. Commercial use of AI-generated images requires caution regarding trademark infringement, with some companies adding AI-generated disclaimers. Installation guidance for Ubuntu 20.04 includes NVIDIA driver updates, Python environment setup, and model downloads from Hugging Face. AUTOMATIC1111's web UI is functionally similar to Fooocus, with workflows involving model selection, parameter tuning, and batch generation followed by upscaling. Recent advances include OpenAI's token-based image generation (GPT-4o), which reasons in pixel space and shows promise for complex, interactive graphics. The AI-driven GPU demand has impacted hardware prices, complicating access for casual users.
Summary generated by the language model.

FAQ

TL;DR: You can generate and edit AI images entirely offline by installing a Stable Diffusion web UI (e.g., Fooocus or AUTOMATIC1111) and at least an 8 GB-VRAM GPU; a 512×512 render takes ~3 min on a GTX 1060 but <40 s on an RTX 3070 [Elektroda, p.kaczmarek2, #21484495; Elektrola, VIGOR_PICTURES, #21484781].

Quick Facts

• Minimum GPU: 8 GB VRAM for SDXL 1.0; 16–24 GB for SD 3.5 [Elektroda, Belialek, post #21486226]
• Typical RAM: 16–32 GB system memory (Ubuntu 20.04 test rig) [Elektroda, andrzejlisek, post #21562441]
• File sizes: v1.5 model ≈ 4 GB; SDXL ≈ 7 GB; Juggernaut-XL v8 ≈ 9 GB [HuggingFace model pages].
• Generation speed: GTX 1060 ≈ 180 s; RTX 3070 ≈ 37 s @512² [Elektroda, p.kaczmarek2, #21484495; VIGOR_PICTURES, #21484781].
• Licence: USCO rejects AI authorship; user owns commercial rights unless trademarks appear [Elektroda, OPservator, #21486463; USCO guidance 2024].

What software lets me run Stable Diffusion locally with a browser interface?

Fooocus and AUTOMATIC1111 are the two most-used open-source UIs. Both launch a local web server, support SD models, prompt expansion, upscaling, in-/out-painting, and LoRA add-ons [Elektroda, p.kaczmarek2, #21484495; andrzejlisek, #21562441].

How do I install AUTOMATIC1111 on Ubuntu 20.04?

Install NVidia driver (e.g., nvidia-driver-535), Python 3.10 virtual env, then clone github.com/AUTOMATIC1111/stable-diffusion-webui, copy model .safetensors files into models/Stable-diffusion, and run python3.9 launch.py [Elektroda, andrzejlisek, post #21562421]

Which GPU do I need for SDXL or SD 3?

SDXL 1.0 needs ≥ 8 GB VRAM. SD 3.0/3.5 loads an 8 billion-parameter checkpoint and realistically needs 16–24 GB VRAM or dual cards [Elektroda, Belialek, post #21486226]

Can I run on CPU only?

Yes; launch with --use-cpu all --skip-torch-cuda-test, but each 512² frame may take >15 min, making workflow impractical [Elektroda, andrzejlisek, post #21562421]

How do I get repeatable images?

Set a fixed Seed (any non-zero integer). Same model, resolution, CFG, and sampling steps will recreate the identical 512² image [Elektroda, p.kaczmarek2, post #21496958]

Why does upscaling change the picture?

Stable Diffusion regenerates details during upscaling passes; to preserve composition, generate at base resolution, then apply ESRGAN/Real-ESRGAN or SD’s ‘Upscale’ script instead of re-diffusing [Elektroda, p.kaczmarek2, post #21535309]

What do Sampling Steps and CFG scale actually do?

Sampling Steps control denoising iterations: 10 ≈ “sketch,” 20 default, 50 gives crisper detail [Elektroda, andrzejlisek, post #21535827] CFG scale (classifier-free guidance) balances adherence to the prompt; 7–9 is natural, <3 drifts, >12 looks over-sharpened [Elektroda, andrzejlisek, post #21535827]

Can I generate technical schematics with SD?

Pixel-diffusion struggles with circuit diagrams; language models fare better by outputting KiCad/Eagle netlists that you then render [Elektroda, Belialek, post #21486324]

Is commercial use of generated art legal?

In the U.S. AI output lacks copyright, so the prompt author owns the work. You may sell or embed it, provided no trademarked content appears [Elektroda, OPservator, #21486463; USCO FAQ 2024].

Does the new GPT-4o image model solve text and layout issues?

OpenAI’s token-based generator can place legible captions, icons, and even play tic-tac-toe because it “reasons in pixel space” rather than diffusing noise [Elektroda, gulson, post #21495667]

How do I speed up experimentation?

Use lower Sampling Steps (e.g., 10) and small Batch Count to preview, then rerun the chosen Seed at higher steps or with ‘High-Res fix’. Wildcard prompts (__object__) batch-test variations while you’re away [Elektroda, p.kaczmarek2, post #21535738]

Why do hands, rails, or text still look wrong?

These fine-structure features are under-represented in training data; diffusion models interpolate shapes statistically, so fingers and lettering distort. LoRAs specialised in anatomy or fonts can mitigate but not eliminate errors [Elektroda, speedy9, post #21484758]

Can Stable Diffusion run on Windows laptops?

Yes. Fooocus auto-installs on Windows; a Core i7-6700HQ, 64 GB RAM and GTX 1060 generated 512² frames in ~3 min [Elektroda, p.kaczmarek2, post #21484495]

What file formats and sizes should I expect?

Checkpoints: .safetensors or .ckpt (4–9 GB each). Outputs: PNG/JPEG, 512² ≈ 1 MB; 4K upscale ≈ 10–15 MB with metadata [HuggingFace model pages].

Can I train the model on my own photos?

Yes. DreamBooth or LoRA fine-tuning lets you add a personal style with as few as 10–20 images; training a LoRA for SDXL takes ≈ 45 min on a 24 GB GPU [InvokeAI docs, 2024].