logo elektroda
logo elektroda
X
logo elektroda

Minitest: robot vision? Multimodal AI LLaVA and workshop photo analysis - 100% local

p.kaczmarek2 
LLaVA project homepage with logo and information.
Here I will test in practice the ability to analyze photos by the LLaVA AI model, which I managed to run locally on my computer. I will check how advanced this model is and how well it can describe photos from the workshop, and I will also check whether it is able to read text from photos.
What is LLaVA?
LLaVA is an abbreviation of "Large Language and Vision Assistant", which loosely translates as a large language and vision model, i.e. able to respond to both text and images. To simplify even more - it is like ChatGPT, but it can also be run locally on our computer.
How to run LLaVA?
I refer here to the previous topic in the series: ChatGPT locally? AI/LLM assistants to run on your computer - download and installation
LLaVA can be downloaded via the WebUI mentioned in the previous topic installed locally on Docker, but interested parties can also see the project`s home page:
https://llava-vl.github.io/
The authors of the project promise quite good results, but of course we are not sure whether their examples were not selected to be as effective as possible (so-called cherry-picking):
Screenshots showing image analysis by different AI models.
For this reason I ran this model locally and tested it for you with different photos .

So here we go. The testing format will be simple
- first, a verbal description of the photo I posted
- then a screenshot of the conversation with LLaVA
- and then separately the pros and cons of the AI response in my opinion
Here are my tests, in no particular order.

Photo of the inside of the dryer
Dryer heating element on a wooden surface with a connected electrical cord.
+ the model recognized that it was a heating element
+ the model recognized that a cable was connected
- the model hallucinated that there was some text overlay here
- the model hallucinated that he saw a table leg

Screen with the T9SMAX logo from Android Box
TV screen displaying T9SMAX logo and promotional slogan.
+ the model recognized that it was a TV screen
+ the model recognized that it was the inscription T9 Max (it lost the S?)

Photo of the dryer:
Hairdryer with black cord on wooden background
+ the model recognized that it was a dryer with a black cable

Multimeter:
Digital multimeter DT832 with test leads on a wooden table.
+ the model recognized that it was a multimeter with wires

Monitor:
Computer monitor on a stand oriented upside down on a tiled floor.
+ the model recognized that it was a monitor
+ the model recognized that the monitor was on its side (photo?)
- the model hallucinated that there was a mouse and keyboard...

Atari:
Old Atari 800XL keyboard on a wooden table.
+ the model recognized the hardware as an old IBM keyboard
- the model hallucinated numeric keypad to the right?

Socket timer:
Electronic device in plastic packaging on a wooden background.
+ the model recognized that it was some kind of programmer
+ the model recognized that it was in a package that had not been opened
- the model assumed it was a thermostat
- the model says there is a barcode here, I don`t see any

Damaged TV (broken matrix):
A broken LG TV screen with colorful lines.
+ the model knew that it was a damaged TV and that there were visible lines, cracks and colored stripes
- the model hallucinated the words "Your TV has been damaged" - absurd?

Smart power socket measure:
Electrical socket on a workshop table.
+ the model recognized that it was an electronic device
- the model hallucinates that it is a charger, that it has some connectors, etc

Screenshot with the MSI logo:
MSI logo on screen with BIOS message
+ the model correctly recognized the MSI logo
- but the model also made up some alleged "Raspberry Pi" below

PCB covered with plastic:
The image shows a small electronic circuit board submerged in a liquid, likely a PCB.
+ the model recognized that it was a tile and that it was flooded

LED lamp lit:
A light bulb glowing on a base, lying on a wooden surface.
+ the model recognized that it was a light source
- the model incorrectly determined the type of lamp and made up a switch
- the model made up tools in the background

Flashing LED lamps:
White LED bulb with attached electronic circuit on a wooden table.
+ the model recognized that it was a "LED bulb"
+ the model also knew that there was a connected system, that it was a DIY system, that there was a prototype board


Prototype board:
Copper-colored perforated board on a wooden surface.
+ the model correctly knew what type of board it was and how it was constructed

Photo paste replacement:
Computer motherboard with tools on a wooden table
+ the model recognized quite precisely what was happening there, even indicated RAM, etc

Screenshot from flasher:
Screenshot of a computer program showing a list of devices and their addresses.
+ the model recognized that it was a screenshot from software...
- the model guessed that these were IP addresses, MAC addresses, etc.

Sonoff NSPanel photo at 17:20:
View of a digital clock displaying time and weather.
+ the model learned that the time and weather were displayed
+ the model tried to read the temperature and time, but it turned out to be average (17:09 instead of 17:20, but it was 23°C)
- the model made a lot of small errors in reading the numbers

Playstation console:
PlayStation console with one black controller on a wooden table.
+ the model recognized that it was a Playstation console with a pad
+ the model recognized the SONY inscription
- did the model come up with the idea that there are supposedly two controllers here?

RCA to SCART converter:
RCA connector on a wooden table.
+ the model recognized that it was related to electronics...
- apart from that, a total failure, hallucinations about PCBs, ICs, protocols

Old router:
Black network router with multiple LAN ports and a detached antenna in front.
+ the model more or less knew that it was a switch, that it had ports, etc
- the model did not notice the antenna
- is the model hallucinating that there is a cable here?
- the model hallucinates that there is an inscription?

Old router and cell...
Image of an electronic device with an Ultrafire battery on top.
+ how did the AI know it was Ultrafire?
+ AI also tried to read the capacity, but it was mixed up with the cell type - 1865mAh?
- AI was unable to determine the type of equipment in the photo

Old radio:
Vintage radio on a countertop with tuning buttons and volume knob.
+ AI recognized that it was a radio
- The AI invented some kind of digital display

System in SOIC housing:
Image of an integrated circuit with the label Winbond on a wooden surface.
+ AI recognized that it was IC and read "Winbond" as "Winebond"
- so it did make one typo

Transformer inside the radio:
Open electrical appliance with a metallic box and wires connected.
+ AI recognized that it was some electrical device, wires
- unfortunately I also hallucinate a lot, where are the relays?

Ball mouse without a ball:
Old ball mouse with a cord on a wooden surface.
+ AI recognized that it was a mouse
- AI failed on the mouse cable (it claims it`s USB)
- AI failed to determine the type of mouse (it claims it is optical)

Loudspeakers:
Two small yellow speakers connected by cables on a wooden surface.
+ The AI recognized correctly that these were two speakers on the table

Camping lamp:
Black and green camping lamp on a wooden background.
- total heresy, where`s the power tool?

LDNIO strip:
Power strip with two electrical sockets and LDNIO logo.
+ AI recognized that it was an electrical strip
+ AI read the LDNIO logo as LONIO (small typo)
- However, the AI is wrong about the number of ports

ESR70 tester with capacitor:
ESR70 tester displaying 2.8 ohms measurement connected to a capacitor.
+ AI recognized that it was some kind of measurement...
- unfortunately the AI thought it was an oscilloscope
+ AI read almost correctly 2.8 ohms from the display, but made it 2.9 kHz

LED ceiling panel:
Square light panel resting on a surface.
+ AI correctly recognized that it was a square LED panel
+ AI even recognized that the panel was new, in the packaging
+ somehow the AI knew it was in the kitchen (by the stove)

PCB with a soldered element:
Close-up of a circuit board with a missing chip.
+ AI recognized that it was a PCB, etc
- However, the AI did not know that something was missing

Reading the time from my DIY clock:
Digital display showing the time 19:54.
+ AI recognized that it was a clock with time...
- but it didn`t read the time correctly, although it tried, 1 and 5 are correct

Laptop interior:
Disassembled laptop interior missing disk and RAM.
+ AI recognized that it was a disassembled laptop
+ AI recognized that the disk was missing (not visible)
- AI thought the disc drive was a battery?
- AI sees RAM somewhere here?

XBOX Controller:
Black Xbox 360 controller lying on a surface.
+ The AI somehow knew that it was specifically an XBox controller

Mouse interior:
Open interior of an electronic device focusing on the mechanism of a computer mouse.
+ AI recognized that it was a mouse, that there was a scroll wheel, etc...
- why does the AI think that this mouse has a red housing? by PCB?

Broken monitor:
Broken computer monitor with cracked screen, on a stand.
+ AI correctly recognized that it was a broken monitor and that it had a broken matrix
- where does AI see the wall socket?

Meme with "fuses":
The image shows a humorous guide to fuse replacement with various items mounted on a board.
Quote:

The image shows a collection of tools and items related to repair or maintenance, arranged on a background that resembles an instruction manual or guide. The title at the top reads "GUIDE TO FUSE REPLACEMENT," suggesting that these tools and parts are used for repairing or replacing fuses in various equipment.


The items included in the display are:



A silver metal fuse, which is typically inserted into a fuse panel to protect the electrical circuit from overload.

Screwdrivers of different sizes, which would be used to remove and replace fuses without damaging the surrounding components.

A small bottle or container with a label that says "500ml," though it`s unclear what it contains.

A small metal object that could possibly be a clip or a tool part.

An automotive fuse puller, which is used to extract the fuse from its socket without breaking the circuit.

A plastic tool with an angled head and a notch for gripping the fuse holder in place while removing the fuse.

A small metal clip, which might be a retention clip for holding the fuse in the socket.

A plastic tool designed to press down on the fuse holder to secure the fuse in place after it has been removed from the socket.

An electrical test light, which can be used to visually check if an electrical circuit is open or closed by identifying the illumination of the light bulb when the probe is clipped to the circuit.

A small metal object that appears to be a pin or a clip with a notch on one end, possibly for holding components together.

An electronic component resembling an integrated circuit (IC), which could be used in fuse replacement for controlling electrical circuits within a device.


The items are neatly organized and labeled, indicating that this might be part of an educational display or a DIY repair guide.

+ AI read the text at the top of the image
- The AI didn`t understand that it was a joke...

Disassembled mouse:
Disassembled computer mouse on a table with visible internal components.
+ AI recognized that it was a mouse in parts

Screwdriver length (photo from the web):
Image of a screwdriver with labeled dimensions.
Screenshot of a conversation about the length of a screwdriver.
+ The AI somehow knew that the total length was 10 inches...

Three screwdrivers:
Three blue screwdrivers with black handles.
+ AI recognized that these were three screwdrivers...

Hammer graphics:
The image shows a claw hammer with a metal head and a black and red handle.
+ AI recognized it was a hammer

Graphic measurement:
Digital display of a measuring device with Accuenergy branding.
+ The AI tried to read something, and it was quite successful with "500".
- most of them are read incorrectly

Walkman:
Image of an open electronic device lying on a wooden table.
- failure, AI thought it was a laptop

Or maybe without opening it?
Portable audio device lying on a wooden surface.
+ AI recognized that it was an audio recorder or radio
- AI decided that LCD is visible here...

My BK7231 clock:
Electronic device with LED display and wiring.
+ AI almost read ABCDE, but read it as ABODE, C merged with D

My clock displaying 20:36:
Digital clock with red illumination displaying the time 20:36.
+AI read 2:36
- however, AI lost 0

This is now the temperature reading:
Image of an LED display showing a temperature of 21°C.
+ AI recognized that there is some temperature here...
- but the reading is heresy, where is 9:30 p.m.? and 12 degrees...

Mouse cable:
The image shows a computer mouse with a blue DE-9 connector cable attached.
- rather a failure, AI forcibly associates USB with mice...

Summary
I must admit that it`s good, especially compared to a few years ago. This model can really recognize a wide range of objects and can even sometimes handle several objects/situations in the frame. Sometimes he can also read the text, although he often distorts it. It`s not as good as closed solutions, but remember that LLaVA is available for download and can run on our machine.
Now the only question is - what to use it for? Maybe a workshop assistant, although it`s probably too early for that? Are we waiting for the 2024/2025 version? I invite you to discuss.

About Author
p.kaczmarek2
p.kaczmarek2 wrote 12163 posts with rating 10121 , helped 580 times. Been with us since 2014 year.

Comments

gulson 07 Apr 2024 09:42

What if we indexed all images on the forum and made it possible to search within those images? 🤔 It would make finding information much easier. But open source models are still not perfect, as the tests... [Read more]

p.kaczmarek2 07 Apr 2024 09:56

This will probably happen sooner or later, as well as assistants who visually determine the fault and help, for example, perform basic activities such as, I don`t know, replacing a light bulb... And... [Read more]

krzbor 08 Apr 2024 20:44

It`s clear he has issues with older devices/things. For him, the RS232 connector simply does not exist, just like RCA to SCART or walkmen. Probably if we showed these things to an eighteen-year-old, he... [Read more]

p.kaczmarek2 10 Apr 2024 18:08

There was simply no RS232 in the training examples, so it doesn`t recognize it. Remember that for this system to recognize RS232 it must be in the training examples, marked manually by humans in multiple... [Read more]

ChatGPT 11 Apr 2024 08:19

I wonder how ChatGPT 4.0 would recognize all this. [Read more]

p.kaczmarek2 11 Apr 2024 08:44

@ElektrodaBot what do you see in the photo? https://obrazki.elektroda.pl/2810756700_1712817834_thumb.jpg [Read more]

ElektrodaBot 11 Apr 2024 08:45

The photo shows part of an electrical or electronic device with the "A+ TECH" logo and a blue plug that appears to be an industrial or specialized connector. This plug has a pin connector which suggests... [Read more]

p.kaczmarek2 11 Apr 2024 08:49

@ElektrodaBot what`s in the photo? What does this graphic mean? https://obrazki.elektroda.pl/3783272400_1712818150_thumb.jpg [Read more]

ElektrodaBot 11 Apr 2024 08:50

The photo shows a graphic titled "Guide to Fuse Replacement", which at first glance may look like instructions for different types of fuses with different current ratings. However, upon closer inspection,... [Read more]

p.kaczmarek2 11 Apr 2024 09:21

@chatgpt i.e. ElektrodaBot (ChatGPT4) detected the joke, only LLaVa couldn`t handle it. If you want, you can invoke ElektrodaBot yourself, but first invoke it in the content of the post and then post... [Read more]

%}