logo elektroda
logo elektroda
X
logo elektroda

Minitest: robot vision? Multimodal AI LLaVA and workshop photo analysis - 100% local

p.kaczmarek2 2148 10
ADVERTISEMENT
Treść została przetłumaczona polish » english Zobacz oryginalną wersję tematu
  • LLaVA project homepage with logo and information.
    Here I will test in practice the ability to analyze photos by the LLaVA AI model, which I managed to run locally on my computer. I will check how advanced this model is and how well it can describe photos from the workshop, and I will also check whether it is able to read text from photos.
    What is LLaVA?
    LLaVA is an abbreviation of "Large Language and Vision Assistant", which loosely translates as a large language and vision model, i.e. able to respond to both text and images. To simplify even more - it is like ChatGPT, but it can also be run locally on our computer.
    How to run LLaVA?
    I refer here to the previous topic in the series: ChatGPT locally? AI/LLM assistants to run on your computer - download and installation
    LLaVA can be downloaded via the WebUI mentioned in the previous topic installed locally on Docker, but interested parties can also see the project`s home page:
    https://llava-vl.github.io/
    The authors of the project promise quite good results, but of course we are not sure whether their examples were not selected to be as effective as possible (so-called cherry-picking):
    Screenshots showing image analysis by different AI models.
    For this reason I ran this model locally and tested it for you with different photos .

    So here we go. The testing format will be simple
    - first, a verbal description of the photo I posted
    - then a screenshot of the conversation with LLaVA
    - and then separately the pros and cons of the AI response in my opinion
    Here are my tests, in no particular order.

    Photo of the inside of the dryer
    Dryer heating element on a wooden surface with a connected electrical cord.
    + the model recognized that it was a heating element
    + the model recognized that a cable was connected
    - the model hallucinated that there was some text overlay here
    - the model hallucinated that he saw a table leg

    Screen with the T9SMAX logo from Android Box
    TV screen displaying T9SMAX logo and promotional slogan.
    + the model recognized that it was a TV screen
    + the model recognized that it was the inscription T9 Max (it lost the S?)

    Photo of the dryer:
    Hairdryer with black cord on wooden background
    + the model recognized that it was a dryer with a black cable

    Multimeter:
    Digital multimeter DT832 with test leads on a wooden table.
    + the model recognized that it was a multimeter with wires

    Monitor:
    Computer monitor on a stand oriented upside down on a tiled floor.
    + the model recognized that it was a monitor
    + the model recognized that the monitor was on its side (photo?)
    - the model hallucinated that there was a mouse and keyboard...

    Atari:
    Old Atari 800XL keyboard on a wooden table.
    + the model recognized the hardware as an old IBM keyboard
    - the model hallucinated numeric keypad to the right?

    Socket timer:
    Electronic device in plastic packaging on a wooden background.
    + the model recognized that it was some kind of programmer
    + the model recognized that it was in a package that had not been opened
    - the model assumed it was a thermostat
    - the model says there is a barcode here, I don`t see any

    Damaged TV (broken matrix):
    A broken LG TV screen with colorful lines.
    + the model knew that it was a damaged TV and that there were visible lines, cracks and colored stripes
    - the model hallucinated the words "Your TV has been damaged" - absurd?

    Smart power socket measure:
    Electrical socket on a workshop table.
    + the model recognized that it was an electronic device
    - the model hallucinates that it is a charger, that it has some connectors, etc

    Screenshot with the MSI logo:
    MSI logo on screen with BIOS message
    + the model correctly recognized the MSI logo
    - but the model also made up some alleged "Raspberry Pi" below

    PCB covered with plastic:
    The image shows a small electronic circuit board submerged in a liquid, likely a PCB.
    + the model recognized that it was a tile and that it was flooded

    LED lamp lit:
    A light bulb glowing on a base, lying on a wooden surface.
    + the model recognized that it was a light source
    - the model incorrectly determined the type of lamp and made up a switch
    - the model made up tools in the background

    Flashing LED lamps:
    White LED bulb with attached electronic circuit on a wooden table.
    + the model recognized that it was a "LED bulb"
    + the model also knew that there was a connected system, that it was a DIY system, that there was a prototype board


    Prototype board:
    Copper-colored perforated board on a wooden surface.
    + the model correctly knew what type of board it was and how it was constructed

    Photo paste replacement:
    Computer motherboard with tools on a wooden table
    + the model recognized quite precisely what was happening there, even indicated RAM, etc

    Screenshot from flasher:
    Screenshot of a computer program showing a list of devices and their addresses.
    + the model recognized that it was a screenshot from software...
    - the model guessed that these were IP addresses, MAC addresses, etc.

    Sonoff NSPanel photo at 17:20:
    View of a digital clock displaying time and weather.
    + the model learned that the time and weather were displayed
    + the model tried to read the temperature and time, but it turned out to be average (17:09 instead of 17:20, but it was 23°C)
    - the model made a lot of small errors in reading the numbers

    Playstation console:
    PlayStation console with one black controller on a wooden table.
    + the model recognized that it was a Playstation console with a pad
    + the model recognized the SONY inscription
    - did the model come up with the idea that there are supposedly two controllers here?

    RCA to SCART converter:
    RCA connector on a wooden table.
    + the model recognized that it was related to electronics...
    - apart from that, a total failure, hallucinations about PCBs, ICs, protocols

    Old router:
    Black network router with multiple LAN ports and a detached antenna in front.
    + the model more or less knew that it was a switch, that it had ports, etc
    - the model did not notice the antenna
    - is the model hallucinating that there is a cable here?
    - the model hallucinates that there is an inscription?

    Old router and cell...
    Image of an electronic device with an Ultrafire battery on top.
    + how did the AI know it was Ultrafire?
    + AI also tried to read the capacity, but it was mixed up with the cell type - 1865mAh?
    - AI was unable to determine the type of equipment in the photo

    Old radio:
    Vintage radio on a countertop with tuning buttons and volume knob.
    + AI recognized that it was a radio
    - The AI invented some kind of digital display

    System in SOIC housing:
    Image of an integrated circuit with the label Winbond on a wooden surface.
    + AI recognized that it was IC and read "Winbond" as "Winebond"
    - so it did make one typo

    Transformer inside the radio:
    Open electrical appliance with a metallic box and wires connected.
    + AI recognized that it was some electrical device, wires
    - unfortunately I also hallucinate a lot, where are the relays?

    Ball mouse without a ball:
    Old ball mouse with a cord on a wooden surface.
    + AI recognized that it was a mouse
    - AI failed on the mouse cable (it claims it`s USB)
    - AI failed to determine the type of mouse (it claims it is optical)

    Loudspeakers:
    Two small yellow speakers connected by cables on a wooden surface.
    + The AI recognized correctly that these were two speakers on the table

    Camping lamp:
    Black and green camping lamp on a wooden background.
    - total heresy, where`s the power tool?

    LDNIO strip:
    Power strip with two electrical sockets and LDNIO logo.
    + AI recognized that it was an electrical strip
    + AI read the LDNIO logo as LONIO (small typo)
    - However, the AI is wrong about the number of ports

    ESR70 tester with capacitor:
    ESR70 tester displaying 2.8 ohms measurement connected to a capacitor.
    + AI recognized that it was some kind of measurement...
    - unfortunately the AI thought it was an oscilloscope
    + AI read almost correctly 2.8 ohms from the display, but made it 2.9 kHz

    LED ceiling panel:
    Square light panel resting on a surface.
    + AI correctly recognized that it was a square LED panel
    + AI even recognized that the panel was new, in the packaging
    + somehow the AI knew it was in the kitchen (by the stove)

    PCB with a soldered element:
    Close-up of a circuit board with a missing chip.
    + AI recognized that it was a PCB, etc
    - However, the AI did not know that something was missing

    Reading the time from my DIY clock:
    Digital display showing the time 19:54.
    + AI recognized that it was a clock with time...
    - but it didn`t read the time correctly, although it tried, 1 and 5 are correct

    Laptop interior:
    Disassembled laptop interior missing disk and RAM.
    + AI recognized that it was a disassembled laptop
    + AI recognized that the disk was missing (not visible)
    - AI thought the disc drive was a battery?
    - AI sees RAM somewhere here?

    XBOX Controller:
    Black Xbox 360 controller lying on a surface.
    + The AI somehow knew that it was specifically an XBox controller

    Mouse interior:
    Open interior of an electronic device focusing on the mechanism of a computer mouse.
    + AI recognized that it was a mouse, that there was a scroll wheel, etc...
    - why does the AI think that this mouse has a red housing? by PCB?

    Broken monitor:
    Broken computer monitor with cracked screen, on a stand.
    + AI correctly recognized that it was a broken monitor and that it had a broken matrix
    - where does AI see the wall socket?

    Meme with "fuses":
    The image shows a humorous guide to fuse replacement with various items mounted on a board.
    Quote:

    The image shows a collection of tools and items related to repair or maintenance, arranged on a background that resembles an instruction manual or guide. The title at the top reads "GUIDE TO FUSE REPLACEMENT," suggesting that these tools and parts are used for repairing or replacing fuses in various equipment.


    The items included in the display are:



    A silver metal fuse, which is typically inserted into a fuse panel to protect the electrical circuit from overload.

    Screwdrivers of different sizes, which would be used to remove and replace fuses without damaging the surrounding components.

    A small bottle or container with a label that says "500ml," though it`s unclear what it contains.

    A small metal object that could possibly be a clip or a tool part.

    An automotive fuse puller, which is used to extract the fuse from its socket without breaking the circuit.

    A plastic tool with an angled head and a notch for gripping the fuse holder in place while removing the fuse.

    A small metal clip, which might be a retention clip for holding the fuse in the socket.

    A plastic tool designed to press down on the fuse holder to secure the fuse in place after it has been removed from the socket.

    An electrical test light, which can be used to visually check if an electrical circuit is open or closed by identifying the illumination of the light bulb when the probe is clipped to the circuit.

    A small metal object that appears to be a pin or a clip with a notch on one end, possibly for holding components together.

    An electronic component resembling an integrated circuit (IC), which could be used in fuse replacement for controlling electrical circuits within a device.


    The items are neatly organized and labeled, indicating that this might be part of an educational display or a DIY repair guide.

    + AI read the text at the top of the image
    - The AI didn`t understand that it was a joke...

    Disassembled mouse:
    Disassembled computer mouse on a table with visible internal components.
    + AI recognized that it was a mouse in parts

    Screwdriver length (photo from the web):
    Image of a screwdriver with labeled dimensions.
    Screenshot of a conversation about the length of a screwdriver.
    + The AI somehow knew that the total length was 10 inches...

    Three screwdrivers:
    Three blue screwdrivers with black handles.
    + AI recognized that these were three screwdrivers...

    Hammer graphics:
    The image shows a claw hammer with a metal head and a black and red handle.
    + AI recognized it was a hammer

    Graphic measurement:
    Digital display of a measuring device with Accuenergy branding.
    + The AI tried to read something, and it was quite successful with "500".
    - most of them are read incorrectly

    Walkman:
    Image of an open electronic device lying on a wooden table.
    - failure, AI thought it was a laptop

    Or maybe without opening it?
    Portable audio device lying on a wooden surface.
    + AI recognized that it was an audio recorder or radio
    - AI decided that LCD is visible here...

    My BK7231 clock:
    Electronic device with LED display and wiring.
    + AI almost read ABCDE, but read it as ABODE, C merged with D

    My clock displaying 20:36:
    Digital clock with red illumination displaying the time 20:36.
    +AI read 2:36
    - however, AI lost 0

    This is now the temperature reading:
    Image of an LED display showing a temperature of 21°C.
    + AI recognized that there is some temperature here...
    - but the reading is heresy, where is 9:30 p.m.? and 12 degrees...

    Mouse cable:
    The image shows a computer mouse with a blue DE-9 connector cable attached.
    - rather a failure, AI forcibly associates USB with mice...

    Summary
    I must admit that it`s good, especially compared to a few years ago. This model can really recognize a wide range of objects and can even sometimes handle several objects/situations in the frame. Sometimes he can also read the text, although he often distorts it. It`s not as good as closed solutions, but remember that LLaVA is available for download and can run on our machine.
    Now the only question is - what to use it for? Maybe a workshop assistant, although it`s probably too early for that? Are we waiting for the 2024/2025 version? I invite you to discuss.

    Cool? Ranking DIY
    Helpful post? Buy me a coffee.
    About Author
    p.kaczmarek2
    Moderator Smart Home
    Offline 
    p.kaczmarek2 wrote 12163 posts with rating 10121, helped 580 times. Been with us since 2014 year.
  • ADVERTISEMENT
  • #2 21035752
    gulson
    System Administrator
    What if we indexed all images on the forum and made it possible to search within those images? 🤔
    It would make finding information much easier.
    But open source models are still not perfect, as the tests show.
    Apparently the best open source one is CogAgent./CogVLM.

    From the paid models we have:
    GPT-4V, Gemini, Claude
    Thanks for the test!
  • ADVERTISEMENT
  • #3 21035771
    p.kaczmarek2
    Moderator Smart Home
    This will probably happen sooner or later, as well as assistants who visually determine the fault and help, for example, perform basic activities such as, I don`t know, replacing a light bulb...

    And now some practical information about how quickly the tests went on my equipment. First the specifications:
    Intel Core i7-6700HQ CPU information at 2.60GHz
    RAM before turning on the model (ignore it, I`m doing something in the background of the small one):
    Screenshot of Windows Task Manager showing memory usage.
    After turning on the model:
    Screenshot of Windows Task Manager showing RAM usage and CPU activity.
    And one more question how long the image is described . It takes a while before the AI starts "speaking":
    Red electrical device with cables on a wooden background.
    And here it is ready - 3 minutes after the start:
    Red electrical device with a power supply connected on a wooden table, with a timer reading 3:00.24 and buttons labeled Runda and Stop.
    In the photo above, the AI obviously didn`t recognize what this equipment was, but it also surprised me...

    This type of image description, especially if it is slightly improved and works faster, can have many applications, "intelligent" galleries of family photos and, for example, searching with the phrase "barbecue photo by the lake", as well as in the workshop - maybe some helper in cleaning/organizing the workshop? The ability to run this locally solves privacy issues.
    Helpful post? Buy me a coffee.
  • #4 21038012
    krzbor
    Level 27  
    It`s clear he has issues with older devices/things. For him, the RS232 connector simply does not exist, just like RCA to SCART or walkmen. Probably if we showed these things to an eighteen-year-old, he wouldn`t recognize them either. What surprised me, however, was that it didn`t recognize the schuko socket. The model was probably trained on American photos :)
    The poor text recognition is also strange - I thought it would be much better.
    The big drawback is the response time - I thought it was just a few seconds.
  • ADVERTISEMENT
  • #5 21040415
    p.kaczmarek2
    Moderator Smart Home
    There was simply no RS232 in the training examples, so it doesn`t recognize it. Remember that for this system to recognize RS232 it must be in the training examples, marked manually by humans in multiple images.

    Why are you disappointed by poor text recognition? Do you know how this system works? As far as I know, it is not classic OCR, nor is there an OCR "sub-module", rather a certain ability to recognize texts is a secondary ability resulting from how the AI works...
    Helpful post? Buy me a coffee.
  • ADVERTISEMENT
  • #6 21040981
    ChatGPT
    Level 11  
    I wonder how ChatGPT 4.0 would recognize all this.
  • #7 21041004
    p.kaczmarek2
    Moderator Smart Home
    @ElektrodaBot what do you see in the photo?
    A4Tech computer mouse next to a blue connector.
    Helpful post? Buy me a coffee.
  • #8 21041006
    ElektrodaBot
    AI OpenAI ChatGPT4
    p.kaczmarek2 wrote:
    The photo shows part of an electrical or electronic device with the "A+ TECH" logo and a blue plug that appears to be an industrial or specialized connector. This plug has a pin connector which suggests it can be used to transmit signals or power in industrial, medical or other specialized applications. This connector has nine pins arranged in three rows of three pins each and is a male connector with a plastic housing.

    If you need help identifying or want to know what this connector or device is for, let me know! As an electronics engineer, I will be happy to help you solve any puzzles or problems related to this device.
    You can call me in any topic by writing @ElektrodaBot. Disclaimer: Some answers provided by the ElektrodaBot may be inaccurate. Elektroda not responsible for accuracy, reliability or completeness of the information presented. All ElektrodaBot responses should be verified by the user. OpenAI can monitor and save information entered after calling ElektrodaBot.
  • #9 21041011
    p.kaczmarek2
    Moderator Smart Home
    @ElektrodaBot what`s in the photo? What does this graphic mean?
    Composition of objects used as fuses with different amperage ratings.
    Helpful post? Buy me a coffee.
  • #10 21041012
    ElektrodaBot
    AI OpenAI ChatGPT4
    p.kaczmarek2 wrote:
    what`s in the photo? What does this graphic mean?
    https://obrazki.elektroda.pl/3783272400_1712818150.png
    The photo shows a graphic titled "Guide to Fuse Replacement", which at first glance may look like instructions for different types of fuses with different current ratings. However, upon closer inspection, you can see that the items shown in the graphic are not actually fuses. This is a humorous show in which everyday objects (such as a piece of aluminum foil, a bullet, a screw, a pin, etc.) and tools (such as a spanner) are used to depict "fuses" with different amperage values.

    Here are the items pictured and their alleged amp ratings:

    - 100 amps: A piece of aluminum foil that could jokingly be used as a makeshift fuse by people who do not understand the dangers or proper operation of real fuses.
    - 250 ampere: cartridge, which is of course a very dangerous and absurd joke, because the use of ammunition in an electrical installation is not only dangerous, but also illegal.
    - 350 Amps (Audiovisual Auto-Alert): The actual fuse may not be available and the item shown in the graphic does not provide any indication of its actual function.
    - 600 amperes: a screw that is not an electrical safety item and would be unsafe to use in this context.
    - 130 amperes: a key to some security device or lock, which is also not a fuse.
    - 1500 Amps (Slow-Blow): Another screw that jokingly suggests being a delayed-acting fuse, which is obviously untrue.
    - 2000 ampere (Slow-Blow): an open-end wrench that is a hand tool, not an element that protects the electrical installation.

    Overall, the graphic is a fun compilation of items that definitely shouldn`t be used as fuses, with captions suggesting their unusual uses. Such graphics can be used for humorous or educational purposes to show what definitely not to do in real electrical installations. In fact, using inappropriate items as replacement fuses is very dangerous and can lead to fires or other serious electrical failures.
    You can call me in any topic by writing @ElektrodaBot. Disclaimer: Some answers provided by the ElektrodaBot may be inaccurate. Elektroda not responsible for accuracy, reliability or completeness of the information presented. All ElektrodaBot responses should be verified by the user. OpenAI can monitor and save information entered after calling ElektrodaBot.
  • #11 21041054
    p.kaczmarek2
    Moderator Smart Home
    @ChatGPT i.e. ElektrodaBot (ChatGPT4) detected the joke, only LLaVa couldn`t handle it.

    If you want, you can invoke ElektrodaBot yourself, but first invoke it in the content of the post and then post the question.

    Added after 1 [minutes]:

    Btw, what this bot wrote earlier regarding the mouse:
    Quote:

    This connector has nine pins arranged in three rows of three pins each

    I see two rows of pins in the photo...
    Helpful post? Buy me a coffee.

Topic summary

The discussion revolves around the capabilities of the LLaVA (Large Language and Vision Assistant) AI model, which can analyze images and read text locally on a computer. Users are testing its performance with workshop photos, noting its strengths and weaknesses. While LLaVA can describe images, it struggles with older technology recognition and text accuracy. Comparisons are made with other AI models like GPT-4V, Gemini, and Claude, highlighting LLaVA's limitations in recognizing certain connectors and its slower response time. The conversation also touches on the potential for future advancements in AI image indexing and fault detection.
Summary generated by the language model.
ADVERTISEMENT