logo elektroda
logo elektroda
X
logo elektroda

[AI] Graphics and image generator on your own computer - web interface for Stable Diffusion

p.kaczmarek2 2961 49
ADVERTISEMENT
Treść została przetłumaczona polish » english Zobacz oryginalną wersję tematu
  • Fooocus interface for generating images with a cabin in a snowy forest. .
    How do I generate images on my own computer? How to change the content of a picture? Is AI able to suggest an interface sketch for a smart home controller? This is what I will try to find out here, all thanks to an easy-to-use web interface based on the Stable Diffusion XL architecture.

    Presented here Fooocus is a completely free, open and 100% local (no internet, on your computer) environment for creating images using AI models based on Stable Diffusion. Fooocus offers, among other things:
    - image generation based on prompts
    - GPT2-based prompt development
    - image upscaling, i.e. increasing the resolution
    - outpainting, i.e. adding parts of an image outside its frame
    - inpainting, i.e. editing part of an image
    - variation, i.e. creating different versions of the image
    - image prompt
    - describe image, i.e. describing the image to the prompt of the selected model
    - negative prompt, i.e. what you don't want in the image
    - the possibility of integrating LORA add-ons, styles and setting their weights, which gives us greater control over what we generate

    Install and run .
    Download the package from the repository , extract, run.bat:
    Fooocus, python_embedded folders and run.bat file in a computer file explorer. .
    In newer versions there will also be separate scripts run_realistic.bat and run_anime.bat, they work in the same way - just select the style of graphics at the start.
    After a while in the browser our Fooocus page will open:
    Fooocus application interface for image generation on the computer. .
    Time to start the fun.

    First generations .
    I must warn at the outset that the graphics take quite a long time to generate. On mine on an Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz, 64GB RAM, GeForce GTX 1060 they take about three minutes to generate.
    We simply type in the prompt and select the options. Sometimes you have to scramble to get the desired effect. In addition, the keywords used affect the style of the whole image, so it's hard to make one 'series' of graphics. But let's give it a go, first a sample prompt. All generated by my laptop.
    Model used: juggernautXL_v8Rundiffusion.safetensors
    
    A cozy cabin in a snowy forest at sunset, warm light glowing from the windows, smoke rising from the chimney
    
    .
    A wooden cabin in a snowy forest at sunset, with warm light glowing from the windows and smoke rising from the chimney. A wooden cabin in a snowy forest at sunset with smoke rising from the chimney. Cabin in a snowy forest at sunset with smoke rising from the chimney. .
    Pretty good, but why is the chimney on fire?


    
    Futuristic city skyline at night, neon lights reflecting on wet streets, flying cars in the sky
    
    .
    Futuristic city at night with neon lights reflecting on wet streets, flying cars in the sky. Futuristic city at night with neon lights reflecting on wet streets and flying cars. Futuristic city at night with neon lights and flying cars over a wet street. .
    Colour scheme great, but the sense of vehicles and backgrounds is quickly lost.

    .
    
    Astronaut floating peacefully above Earth, with stars and galaxies in the background, contemplative mood
    
    .
    Astronaut floating in space with Earth and stars in the background. Astronaut floating in space above Earth, with a galaxy in the background. Astronaut in a spacesuit floating above Earth in outer space. .
    Strange details again.

    .
    
    A cat café with cozy decor, wooden furniture, and playful cats lounging around while people sip coffee
    
    .
    Cat café with cats on shelves and a table. Cozy cat café with wooden furniture where people enjoy coffee. Interior of a cozy cat café with pink decor and cats on the furniture. .
    Here you can see the artefacts much more strongly, you often end up like this trying to generate a scene with many characters.

    .
    
    A cute fox wearing a small backpack, wandering through an autumn forest filled with orange and red leaves
    

    Illustration of a fox with a backpack walking through an autumn forest. Plush fox with a backpack walking through an autumn forest A cute fox with a backpack walking through an autumn forest covered in orange leaves. .
    Quite good, just always a slightly different character.

    .
    
    A massive floating island with waterfalls cascading into the clouds, bioluminescent plants glowing softly
    
    .
    Floating island with waterfalls and bioluminescent plants. Floating island with waterfalls and bioluminescent plants Floating island with waterfalls and bioluminescent plants. .

    .
    
    A colossal mechanical dragon flying over a steampunk city, gears and smoke in the background
    
    .
    Mechanical dragon flying over a steampunk city A massive mechanical dragon flying over a steampunk city. Massive mechanical dragon flying over a steampunk city. .


    Electronics through the eyes of AI .
    Just out of curiosity - does AI know what an Arduino looks like?
    
    	arduino, electronics, table
    

    Two Arduino microcontrollers with wires on a wooden table. Set of electronic components on a brown table, including Arduino boards and multicolored wires. Electronic setup on a wooden table with connected wires and a computer in the background. .
    Something must have been in the teaching data, but still the heresy comes out....
    And the diagrams?
    
    schematic, full bridge rectifier, graetz, electronics
    

    Complex electrical schematic with distorted text.
    Complex electrical schematic with illegible labels. .
    I wasn't expecting much, but this font is cool too. Still, this
    "RECCTITICIFER".
    The next generation has done... a picture of the element:
    Three bridge rectifiers with unreadable labels on a gray surface. .
    How about AI generating us retro hardware?
    unitra, radio, retro, table, receiver
    

    Stylish retro radio with a wooden case and metal knobs. Old Unitra radio with a wooden casing and classic panel. Retro radio on a wooden table with a distinctive round knob. .
    Here's maybe another weird experiment....
    
    polish car, fiat 126p, retro, polish city, street
    

    A small retro car driving on a street with historic buildings in the background. An orange retro car parked on a cobblestone street with colorful buildings in the background. Retro car parked on a cobblestone street in a city, with classic architecture in the background.

    Inpainting .
    Inpainting allows us to modify or completely change part of the image content. For example, we can add a travelling companion to our fox. To use inpainting, turn on the "input image" option and drag the image we want to edit there. We then select the zones we are going to edit.
    Animated fox with a backpack walking through an autumn forest .
    The prompt I gave was similar to the one before, only now it included the word "owl".
    The first two results:
    Fox with a backpack in an autumn forest with an owl A plush fox with a backpack walking in an autumn forest. .

    Trials of practical application .
    Finally, I tried to use this generator to create sketches/proposals of various kinds of devices for a "smart home", for example maybe some kind of heating controller, weather station or some other type of touch panel there. I know that the text of these models does not generate readable text, but maybe some idea will give me?
    
    smart home, ui, design, chart, heater, interface, screenshot
    
    .
    User interface on a large screen with charts and text in an unreadable language. Smart home interface display with charts and graphics. Smart control panel on the wall with a graphic interface Smartphone displaying a smart home app user interface. User interface on a mobile device for a smart home. User interface on a mobile device for a smart home. Heater panel interface with charts and indicators Tablet screen displaying a smart home user interface with charts and statistics. Person using a smart home interface on a tablet with charts and icons. Wall-mounted control panel with colorful chart and round dial below. Wall-mounted control panel with colorful chart and round dial below. .
    From a distance it still resembles something, the colour scheme is probably ok too, but the text and details are completely wrongly drawn.

    Summary .
    To was a short presentation showing what images could easily be generated on typical consumer hardware. It didn't turn out to be that difficult at all and the results are pretty good too, although I'm sure opinions will be divided. Out of curiosity I also tried to generate slightly more practical things, but from this angle the model used limps along so I wouldn't expect much from it.
    I'll leave the final verdict to you, only here I'll emphasise that all generation is 100% local, as well as highly customisable and modifiable, so it might be worth taking an interest in, if only out of privacy concerns.
    Do you use image generators and if so, what for?

    Cool? Ranking DIY
    Helpful post? Buy me a coffee.
    Do you have a problem with Arduino? Ask question. Visit our forum Arduino.
    About Author
    p.kaczmarek2
    Moderator Smart Home
    Offline 
    p.kaczmarek2 wrote 11822 posts with rating 9927, helped 564 times. Been with us since 2014 year.
  • ADVERTISEMENT
  • #2 21484579
    VIGOR_PICTURES
    Level 21  
    Cool stuff. I'm going to test it on my "stationary" tomorrow. Especially interesting are these additional options in addition to the usual image generation. Maybe you will get used to it and make your life easier ;) .
  • #3 21484664
    p.kaczmarek2
    Moderator Smart Home
    You certainly need to be patient, as generating good graphics can take a little more trying than with ready-made, closed generators that are not available for download.

    It is also a good idea to disable styles and "prompt expansion" to start with.

    I wonder how good the generators will be in 5-10 years?
    Helpful post? Buy me a coffee.
  • ADVERTISEMENT
  • #4 21484758
    speedy9
    Helpful for users
    I like the owl-shaped backpack the best (something went wrong here ;) ) and that cat:

    Owl-shaped backpack standing on autumn leaves.
    A cat hanging from the ceiling by its paws in a room with hanging plants.

    There are things that generators can't cope with. Subtitles, that's a common one. They still can't cope with generating train tracks, for example, especially when they somehow connect. Their layout makes no sense. Below is the Bing image generator used.
    Such an example from the prompt "generate an image of a rural narrow gauge railway station in Greater Poland in the 1920s in an autumn setting. With a train waiting to load grain".
    The image depicts a rural train station from the 1920s with a steam train in an autumn setting. .
    Two more images from the prompt: "generate an image of a rural narrow gauge railway station in Greater Poland (lowlands) in the 1920s in an autumn scene. Add a train waiting to load grain and a passenger train waiting on a siding for passengers."
    The image depicts a rural train station with a train on the tracks in an autumn setting. .
    Wagons also like to stand not necessarily on rails ;)

    Image of a rural train station with a freight and passenger train in an autumn setting.

    P.S. from that chimney in the generated house at the beginning of this topic it smokes like a steam engine or some kind of steelworks ;)

    p.kaczmarek2 wrote:
    I wonder how good the generators will be in 5-10 years?
    .
    They will certainly be better. Hopefully they will get rid of some of the artificiality in the generated graphics that you just see.
  • #5 21484781
    VIGOR_PICTURES
    Level 21  
    p.kaczmarek2 wrote:
    You definitely need to be patient
    .
    A while ago I installed myself fooocus with the default models (it actually installed itself).
    I was most curious to see what the difference in generation time would be between the laptop GTX1060 card and the RTX3070 Supreme. It took me 37 seconds to generate a house in a winter setting.

    Comparison of generating images of a winter cabin in a forest using different model settings on a computer. .

    Well, nice. I've just discovered that you can go into the advanced settings and select a preset for yourself regarding model selection and generation speed. Now instead of sitting at work I am downloading weird stuff and wondering why I need all this ;) .
  • #6 21484794
    p.kaczmarek2
    Moderator Smart Home
    speedy9 wrote:
    .
    There are things that generators can't cope with. Subtitles, that's a common one. They still can't cope with generating train tracks, for example, especially when they somehow connect. Their layout makes no sense
    .
    The character's hands (fingers) are a typical example, and as I tested, the wires cause problems too. The more complex the concept the worse, but it also depends what was in the teaching data.

    I am most discouraged by the so-called "keyword/style bleed", when giving one keyword affects the whole image.

    And of interest, try for example generating a wine glass filled to the brim.
    The image shows a half-filled wine glass and text about the challenge for AI to generate a fully filled wine glass. .
    Four images showing wine glasses surrounded by candles and roses on a wooden table.
    Helpful post? Buy me a coffee.
  • #7 21484817
    speedy9
    Helpful for users
    p.kaczmarek2 wrote:
    And of interest, try, for example, to generate a wine glass filled to the brim.

    Lol, good one!
    Neither Bing nor ChatGPT actually can (both are the DALL E model). Even after pointing out the error and asking to raise the wine level by 100%,. The glass is always half full.
  • #8 21484854
    OPservator
    Level 38  
    Apparently ChatGPT 4o has the ability to generate simple templates for 3D printing, has anyone checked how far this stands out or is suitable? Let's give it a simple phone stand....
  • ADVERTISEMENT
  • #9 21484866
    speedy9
    Helpful for users
    There are some such generators, but I haven't tested them, I don't have a printer :) E.g. meshy.ai
  • #10 21484934
    p.kaczmarek2
    Moderator Smart Home
    speedy9 wrote:
    .
    Neither Bing nor ChatGPT actually can (both are DALL E model). Even after pointing out the error and asking to raise the wine level by 100%,. The glass is always half-filled.

    This illustrates how influential the learning data is. By manipulating the learning data, you can strongly influence the performance of the model. It's not so much 'real' intelligence as some people think, it's more of a mathematical model based on statistics....
    Helpful post? Buy me a coffee.
  • #11 21484939
    speedy9
    Helpful for users
    Unfortunately you are right, and this is the biggest threat from any AI. If people start to believe uncritically what AI generates, whether in terms of information or images, it is a very simple way to manipulate society. And AI hallucinates, makes up facts, even if it has access to correct data.
  • #12 21485387
    andrzejlisek
    Level 31  
    In my opinion, a good test of these generators is requests for technical and application graphics, e.g. "spectrum of a high-pressure sodium lamp", "phase diagram of water", "inside of a quartz resonator", which can be easily verified. The generator from OpenAI at this point is a doddle. It will lubricate something there, but it does not reflect the truth and is not useful information. And paradoxically, if you ask a textual model, such as GPT-4o or Claude-3.5-Sonnet, it will correctly and truthfully represent the most important characteristic elements. You can also ask for the SVG graphics code and you will get it, and the graphics will be more useful.

    I've never tested Sable Diffusion, but as you can see, it does very well at producing decorative, fantasy graphics that can be used to decorate web articles, make a postcard or calendar, or framed to hang on a wall. It remains to be seen how it copes with applied graphics. Although there is already such a test, which is an attempt to depict the Fiat 126p. He has painted a car that resembles, but is nevertheless different from, a real Fiat 126p.
  • #13 21485812
    gulson
    System Administrator
    There is a lack of a model for technical drawings, let alone diagrams. There's quite a niche here, it will be interesting to see if they ever get around to it.
    Although it won't be easy, because how do you create training data from schematic diagrams? Since they are not described in words?
    You would probably have to pull in every datasheet of the chip.
  • #14 21486147
    OPservator
    Level 38  
    If the AI were fed with the diagrams described, it might understand something. At the moment, based on a photo, it is able to match a part to a motorbike and even determine the brand, although it does not perceive, for example, the difference in the length of firing pins between vintages, but this data can be supplemented on its own and hits the spot. It should be noted that up to a year ago, they did not analyse images, and if any had this function, it was not sufficiently developed/integrated into their knowledge.

    The mere fact of being able to search the internet is already a considerable step forward, but it does not prevent one from perusing google until the 20th page :) .
  • #15 21486226
    Belialek
    Level 22  
    Playing AI on a graphics card with VRAM=6GB had to end up like this :) On this hardware you run SD 1.4 or 1.5 - i.e. relatively "poorly educated" models, from 2-3 years ago. SDXL 1.0 already needs 8GB VRAM and the latest SD 3.0/3.5 16GB VRAM (optimally 24GB).

    In any kind of AI model, it is the amount of VRAM that matters most, because it determines which 'knowledge' resources the model will use. Stable Diffusion 1.5 is an 8m model (800 million parameters), where SD 3.5 is already an 8B model (8 billion). And this is where the stumbling blocks start - cards with this amount of VRAM are high-end products, so you are practically limited to RTX 3090, 4090, 5090 models - prices are unfortunately prohibitive if you just want to play around. The alternative is to use two cheaper units (the VRAM adds up). And unfortunately only nVidia cards are involved - most of the new models are designed exclusively for Tensor cores. This is a pity, because a Radeon 7900XTX with 24GB VRAM can be bought for just under 5,000zl.

    gulson wrote:
    There is no model for technical drawings, let alone schematics. There's quite a niche here, it will be interesting to see if they ever get around to it.
    Although it won't be easy, because how do you create training data from schematic diagrams? Since they are not described in words?
    You would probably have to pull in every single datasheet of the chip.


    Synopsys.ai :)

    Of such more down-to-earth, hobbyist systems, there is an option in Autodesk Fusion 360 in the PCB module to assist AI in schematic/PCB creation - how advanced I don't know, as I have the basic version :) .
  • #16 21486237
    OPservator
    Level 38  
    The demand for AI has definitely had a negative impact on GPU prices, which of course is classically lost on gamers.
  • #17 21486273
    Belialek
    Level 22  
    OPservator wrote:
    Demand for AI has definitely had a negative impact on GPU prices, which of course classically loses out to gamers.


    This is influenced by many factors, AI being just (or as much as) one of them. I personally have been unlucky, due to a crash I had to buy graphics in 2022. (peak of cryptocurrency digging + covid) and instead of the recommended £1,600 for an RTX3060 I had to spend more than 2x as much, because the cards simply weren't on the market. Now I had to buy another PC - and we have a war, a scramble with tariffs and a boom in AI :) For a mid-range set (9800X3D + 5070ti) you have to count 9-10 thousand.... Overpricing in the PC segment and hardware shortages in shops worldwide is a good topic for another topic :)
  • #18 21486278
    OPservator
    Level 38  
    I for one regret not selling my RTX 3070 in the peak as used ones were going for 6-7k, I bought it literally before the price spike as it was December 2019 :D .
    If I sold it then, I'd probably have a 4080 now.

    Added after 2 [minutes]:

    Belialek wrote:
    Pc segment price tag
    .
    My technical school teacher, a networker, always used to say that the scrap metal in IT was some of the most expensive scrap metal in the world.
    He used to say this just after he persuaded the school to buy a big-ass rack to hold his private crypto diggers, for which he got fired 3 years later - I think it paid him back anyway :D .
  • ADVERTISEMENT
  • #19 21486294
    p.kaczmarek2
    Moderator Smart Home
    I would rather not expect a good AI model for technical drawings/schematics any time soon. Text is easier to generate, because it is enough to predict successive tokens, but a technical drawing? A schematic, in the form of a bitmap? After all, as far as I know, Stable Diffusion etc. doesn't 'understand' what it is drawing, it just performs successive 'de-summing' steps according to the patterns it has learned.... I would sooner suspect that there will be a system that somehow "textually" plans the schematic first and then draws it. I don't very much see the direct generation of a technical drawing in the form of an image.

    Well, but we will see over time....
    Helpful post? Buy me a coffee.
  • #20 21486324
    Belialek
    Level 22  
    p.kaczmarek2 wrote:
    I would rather not expect a good AI model for technical drawings/schematics any time soon. Text is easier to generate, because it is enough to predict successive tokens, but a technical drawing? A schematic, in the form of a bitmap? After all, as far as I know, Stable Diffusion etc. doesn't 'understand' what it is drawing, it just performs successive 'de-summing' steps according to the patterns it has learned.... I would sooner suspect that there will be a system that somehow "textually" plans the schematic first and then draws it. I don't very much see the direct generation of a technical drawing in the form of an image.

    Well, but we will see over time...
    .

    Not to look far - model o1 for my prompt: "Generate a schematic of an electronic circuit that, when connected to 230V AC power, will blink a red LED in a system of 2 seconds on, 1 second off. To this, every 3 blinking cycles the SSR-40DA relay will be switched on, and after another 3 cycles it will be switched off. The whole thing is to be done on discrete components (no programmable circuits)."

    He threw out such a descriptive statement:
    Spoiler:
    Summary of operation

    Low-voltage power supply
    Since the circuit needs power to operate the 555 timer, counter and LED, it is necessary to provide a stable direct current (DC) voltage in the range of, for example, 5-12 V.
    The simplest way to do this is to use a small transformer (e.g. 230 V → 12 V), followed by a rectifier bridge, a filter capacitor and possibly a linear stabiliser (e.g. 7812 for 12 V).
    Alternatively, a transformerless power supply (with a series capacitor and appropriate current limitation) can be used, but for safety and reliability reasons, a transformer is usually a better option for such circuits (especially when SSRs need to be powered).

    Astable generator on NE555
    The NE555 (or another classic timer, such as the LM555) operates in astable mode, generating a rectangular waveform with a total period of 3 s.
    The requirement is for the 'high state' to last 2 s and the 'low state' to last 1 s.
    Typical NE555 astable mode timing formulas:
    Twysoki=0.693 (R1+R2) CiTniski=0.693 (R2) C
    Twysoki=0.693(R1+R2)CiTniski=0.693(R2)C With the appropriate selection of resistors R1R1, R2R2 and capacitor CC, we obtain Twysoki≈2 sTwysoki≈2s and Tniski≈1 sTniski≈1s.
    Output (pin 3) from NE555:
    Controls the LED (via a limiting resistor). When the output is in high state - the LED lights up (2 s), when low - it goes out (1 s).
    Serves as a clock signal (clock) for the counter counting consecutive cycles.

    Cycle counter - e.g. CD4017 (decimal point chip, so-called decade)
    Although the CD4017 is an integrated circuit, it is neither a microcontroller nor a programmable circuit - it is a classic Johnson binary counter with a decade output. It can therefore be considered a 'discrete circuit' in the sense that it is a standard CMOS chip for counting pulses.
    The counter counts consecutive clock edges from the NE555 (every 3 s). There are then successive activations of the outputs Q0, Q1, Q2, ..., Q9.
    We need a 6-step cycle:
    3 steps (e.g. Q0, Q1, Q2) = SSR on
    3 steps (Q3, Q4, Q5) = SSR off
    Then the cycle is to repeat itself.
    It is therefore sufficient to:
    Reset the counter as soon as there is a state on the Q6 output. In the CD4017, the RESET pin is applied via a resistor/directly to Q6. This means that the counter counts Q0, Q1, Q2, Q3, Q4, Q5, and when it wants to go to Q6, there is an immediate reset to Q0.
    To control the SSR we take OR (or simple diode summing) from the outputs Q0, Q1, Q2 → this will be the "on" state.
    When the counter is on Q3, Q4, Q5 - the output for the SSR will be in the low state (no control signal), so the SSR will be off.

    SSR-40DA diode control
    The SSR-40DA normally accepts a control voltage in the range of 3-32 V DC. It can therefore be directly connected to the output of the CD4017 counter (with OR from Q0, Q1, Q2), although an intermediate transistor is usually recommended.
    The circuit can look like this:
    "SSR_ON" signal (from diode OR of Q0, Q1, Q2) → base of NPN transistor (e.g. BC547) via ~10 kΩ resistor.
    Emitter to ground.
    Collector - output to SSR (plus, of course, a pull-up resistor to +12 V and a protection diode if needed).
    This arrangement will facilitate stable driving of the SSR input.
    Take the SSR power input from the same +12 V (or other low voltage) line, common ground with the 555 circuit and CD4017.

    SSR load on the 230 V side
    The SSR-40DA output controls the 230 V AC mains voltage.
    The load (e.g. motor, heater, lighting, etc.) is connected in series with the SSR output and the 230 V supply line.
    All safety rules must be observed, of course - the SSR will have one side to the line and the other to the load (or vice versa). The (low-voltage) control system must be completely isolated or at least powered by a safety transformer.


    And this schematic (admittedly in ASCII, but the schematic is a schematic :D )

    ASCII diagram of an electronic circuit for blinking a red LED and controlling an SSR-40DA relay.

    Including description
    Spoiler:
    Explanation of main blocks

    Power supply (left part of diagram)
    The 230 V AC voltage is stepped down by a transformer to ~12 V AC, then rectified in a bridge, capacitor filtered and (optionally) stabilised (e.g. 7812) to +12 V DC.
    This +12 V line supplies the ICs (NE555, CD4017), the LED with resistor and the SSR control transistor circuit.

    NE555 generator (block A)
    Operates in astable mode, generating a rectangular signal with a period of 3 s (2 s high, 1 s low).
    The output (pin 3) simultaneously:
    flashes an LED (via a series resistor to +12 V or to ground - depending on configuration),
    gives a clock signal (clock) to the CD4017 input.

    CD4017 counter (block B)
    Counts the edges of the signal from NE555 (every 3 s).
    The outputs Q0, Q1, Q2 (consecutive states for 3×3 s = 9 s) are summed by the "OR" diodes, indicating the "SSR ON" state.
    Outputs Q3, Q4, Q5 - no signal ("SSR OFF").
    The output Q6 connected to RESET immediately restarts the counter to Q0 as soon as the counter would like to go to state Q6. Effect: 6-cycle sequence (Q0→Q1→Q2→Q3→Q4→Q5→(reset)→Q0...).

    SSR control transistor
    The summed signal from Q0, Q1, Q2 (via OR) passes through the base resistor to the base of the NPN transistor.
    When the signal is high (during Q0/Q1/Q2), the transistor saturates, applying voltage to the input of the SSR (e.g. SSR-40DA).
    The SSR with the load on its 230 V AC output side will then pass current (ON).
    When Q3/Q4/Q5, the signal is low, the transistor is off, the SSR gets no control current → load cut off (OFF).


    It also generated me a simplified netlist for KiCad, but I have no way of checking. It's just a blinking diode and a diagram in ASCII, but remember what AI-generated graphics looked like just 2 years ago.

    And this is "only" the o1 model, I don't have access to the o1 pro unfortunately.

    EDIT: I'm no expert, but I can see that what he suggested nicely in the description, he "forgot" to add on the schematic (at least R1 and R2 for NE555) :D I got the whole answer in one go, so probably if I had a chat with him he would have improved.

    Generating the schematic itself in graphical format is indeed cumbersome, but it's enough to create text files in the format used by well-known circuit design programs (Eagle, KiCad) to create schematics - an even better option, because once the file is imported, not only do we have a graphical representation, it's also easy to edit.
  • #21 21486421
    p.kaczmarek2
    Moderator Smart Home
    Thanks for the tests. So as I wrote - sooner a language model based on tokens will draw this than a model operating on pixels.
    Helpful post? Buy me a coffee.
  • #22 21486423
    OPservator
    Level 38  
    ChatGPT 4o let me know that it would take a while to generate the schematic and he would write to me when he had finished.
    After 15 minutes I got a message saying that he was unable to produce the schematic, but could describe it to me.
  • #23 21486428
    Belialek
    Level 22  
    p.kaczmarek2 wrote:
    Thanks for the tests. So as I wrote - sooner a token-based language model will draw this than a pixel-based model.


    Not necessarily - the algorithms themselves try to leapfrog their limitations. Very often I get code in python in response that, when run, does exactly what I asked for, which ChatGPT was explicitly unable to generate for me. Nice diagrams in Eagle or other CAD, are actually text files that the application interprets and displays graphically. I suspect that with a bit of willpower, good promtions of fun I could work out a workflow where I would get code to run which would ultimately generate a schematic in raster graphics format.
  • #24 21486461
    perfi
    Level 14  
    And does anyone know how it is with the commercial use of graphics generated by such models? What happens if I use a forest generated by such a model (e.g. what is shown at the beginning) in my game or decide to build a display for heating as in the generated images?
  • #25 21486463
    OPservator
    Level 38  
    Quote:
    The consensus so far is this: The USCO (United States Copyright Office) takes the position that "artificial intelligence does not have the human authorship needed to establish its copyright". This means that authorship and ownership of works created by AI must be attributed to the users creating the queries.

    Samsung got out of this by adding a caption that the image was generated by artificial intelligence.

    Added after 59 [seconds]:

    Quote:
    you can sell your rights to the materials you produce, incorporate them into works such as books, websites and presentations, and commercialise them in other ways." With that said, the DALL-E team encourages users to disclose the contribution of AI to their projects.


    Added after 1 [minute]:

    The main thing is that the AI image should not contain trademark infringements - i.e. you cannot, for example, sell AI graphics depicting, for example, a model of your car signed with the HotWheels logo.
  • #26 21495540
    andrzejlisek
    Level 31  
    Such a general question. Can this whole offline generator work deterministically? Is it possible to configure it in such a way (e.g. by specifying a draw seed) that if I call up an image query, I get an image, and if I later call up the same query, I get the same image and not another variant of the image that represents the same thing?

    Normally this uses pseudo-randomisation and every time it is called, the result is different, just as it was intended. Is it possible to control it with this, i.e. eliminate the pseudo-randomisation or start with the same seed each time?
  • #27 21495667
    gulson
    System Administrator
    Yesterday they released a real bombshell:
    https://openai.com/index/introducing-4o-image-generation/
    Sensational generation of captions, postcards, infographics, icons etc.
    Graphic designers hate them.

    "What's important about this new kind of image generation, which happens with tokens rather than diffusion, is that it effectively reasons in pixel space.
    Example: Ask it to draw a notebook with a blank circle-and-cross grid, and then tell it to make the first move, and then you make the move, and so on."
  • #28 21495680
    OPservator
    Level 38  
    [AI] Graphics and image generator on your own computer - web interface for Stable Diffusion .

    Further messing around a bit, I see a mix of Łódź and Warsaw here, but it's not too bad.
  • #29 21495682
    gulson
    System Administrator
    Are you sure this is the latest model released yesterday? Because it looks like the old Dall-E
  • #30 21495705
    OPservator
    Level 38  
    Fact, something didn't crank for me.
    Colorful graphic depicting a city with architectural elements and artistic patterns. .
    I'll admit it came out mega :D

Topic summary

The discussion centers on generating AI-based images locally using the open-source Fooocus web interface built on Stable Diffusion XL architecture. Fooocus supports prompt-based image generation, GPT2-assisted prompt refinement, upscaling, inpainting, outpainting, image variation, and negative prompts, all running offline without internet dependency. Users report varying generation times depending on GPU hardware, with RTX 3070 significantly outperforming GTX 1060. Challenges include AI limitations in rendering complex elements such as human hands, wires, subtitles, and technical schematics. Technical and application graphics remain problematic due to lack of specialized models and training data, with some success noted in text-based schematic generation (ASCII diagrams, netlists) via language models rather than pixel-based diffusion models. VRAM capacity is critical for running advanced models like SDXL 3.0/3.5, requiring high-end NVIDIA GPUs (RTX 3090, 4090, 5090), while AMD cards lack Tensor core support. The impact of AI demand on GPU pricing and availability is also discussed. Deterministic image generation is possible by setting fixed seeds in Fooocus. Legal considerations for commercial use of AI-generated images include avoiding trademark infringements and proper attribution. Recent advances in token-based image generation (e.g., OpenAI 4o) promise improved reasoning in pixel space, potentially benefiting technical and infographic generation. Overall, AI image generation excels in decorative and fantasy graphics but faces significant hurdles in precise technical and schematic rendering.
Summary generated by the language model.
ADVERTISEMENT