
Here I will test in practice the ability to analyze photos by the LLaVA AI model, which I managed to run locally on my computer. I will check how advanced this model is and how well it can describe photos from the workshop, and I will also check whether it is able to read text from photos.
What is LLaVA?
LLaVA is an abbreviation of "Large Language and Vision Assistant", which loosely translates as a large language and vision model, i.e. able to respond to both text and images. To simplify even more - it is like ChatGPT, but it can also be run locally on our computer.
How to run LLaVA?
I refer here to the previous topic in the series: ChatGPT locally? AI/LLM assistants to run on your computer - download and installation
LLaVA can be downloaded via the WebUI mentioned in the previous topic installed locally on Docker, but interested parties can also see the project`s home page:
https://llava-vl.github.io/
The authors of the project promise quite good results, but of course we are not sure whether their examples were not selected to be as effective as possible (so-called cherry-picking):

For this reason I ran this model locally and tested it for you with different photos .
So here we go. The testing format will be simple
- first, a verbal description of the photo I posted
- then a screenshot of the conversation with LLaVA
- and then separately the pros and cons of the AI response in my opinion
Here are my tests, in no particular order.
Photo of the inside of the dryer

+ the model recognized that it was a heating element
+ the model recognized that a cable was connected
- the model hallucinated that there was some text overlay here
- the model hallucinated that he saw a table leg
Screen with the T9SMAX logo from Android Box

+ the model recognized that it was a TV screen
+ the model recognized that it was the inscription T9 Max (it lost the S?)
Photo of the dryer:

+ the model recognized that it was a dryer with a black cable
Multimeter:

+ the model recognized that it was a multimeter with wires
Monitor:

+ the model recognized that it was a monitor
+ the model recognized that the monitor was on its side (photo?)
- the model hallucinated that there was a mouse and keyboard...
Atari:

+ the model recognized the hardware as an old IBM keyboard
- the model hallucinated numeric keypad to the right?
Socket timer:

+ the model recognized that it was some kind of programmer
+ the model recognized that it was in a package that had not been opened
- the model assumed it was a thermostat
- the model says there is a barcode here, I don`t see any
Damaged TV (broken matrix):

+ the model knew that it was a damaged TV and that there were visible lines, cracks and colored stripes
- the model hallucinated the words "Your TV has been damaged" - absurd?
Smart power socket measure:

+ the model recognized that it was an electronic device
- the model hallucinates that it is a charger, that it has some connectors, etc
Screenshot with the MSI logo:

+ the model correctly recognized the MSI logo
- but the model also made up some alleged "Raspberry Pi" below
PCB covered with plastic:

+ the model recognized that it was a tile and that it was flooded
LED lamp lit:

+ the model recognized that it was a light source
- the model incorrectly determined the type of lamp and made up a switch
- the model made up tools in the background
Flashing LED lamps:

+ the model recognized that it was a "LED bulb"
+ the model also knew that there was a connected system, that it was a DIY system, that there was a prototype board
Prototype board:

+ the model correctly knew what type of board it was and how it was constructed
Photo paste replacement:

+ the model recognized quite precisely what was happening there, even indicated RAM, etc
Screenshot from flasher:

+ the model recognized that it was a screenshot from software...
- the model guessed that these were IP addresses, MAC addresses, etc.
Sonoff NSPanel photo at 17:20:

+ the model learned that the time and weather were displayed
+ the model tried to read the temperature and time, but it turned out to be average (17:09 instead of 17:20, but it was 23°C)
- the model made a lot of small errors in reading the numbers
Playstation console:

+ the model recognized that it was a Playstation console with a pad
+ the model recognized the SONY inscription
- did the model come up with the idea that there are supposedly two controllers here?
RCA to SCART converter:

+ the model recognized that it was related to electronics...
- apart from that, a total failure, hallucinations about PCBs, ICs, protocols
Old router:

+ the model more or less knew that it was a switch, that it had ports, etc
- the model did not notice the antenna
- is the model hallucinating that there is a cable here?
- the model hallucinates that there is an inscription?
Old router and cell...

+ how did the AI know it was Ultrafire?
+ AI also tried to read the capacity, but it was mixed up with the cell type - 1865mAh?
- AI was unable to determine the type of equipment in the photo
Old radio:

+ AI recognized that it was a radio
- The AI invented some kind of digital display
System in SOIC housing:

+ AI recognized that it was IC and read "Winbond" as "Winebond"
- so it did make one typo
Transformer inside the radio:

+ AI recognized that it was some electrical device, wires
- unfortunately I also hallucinate a lot, where are the relays?
Ball mouse without a ball:

+ AI recognized that it was a mouse
- AI failed on the mouse cable (it claims it`s USB)
- AI failed to determine the type of mouse (it claims it is optical)
Loudspeakers:

+ The AI recognized correctly that these were two speakers on the table
Camping lamp:

- total heresy, where`s the power tool?
LDNIO strip:

+ AI recognized that it was an electrical strip
+ AI read the LDNIO logo as LONIO (small typo)
- However, the AI is wrong about the number of ports
ESR70 tester with capacitor:

+ AI recognized that it was some kind of measurement...
- unfortunately the AI thought it was an oscilloscope
+ AI read almost correctly 2.8 ohms from the display, but made it 2.9 kHz
LED ceiling panel:

+ AI correctly recognized that it was a square LED panel
+ AI even recognized that the panel was new, in the packaging
+ somehow the AI knew it was in the kitchen (by the stove)
PCB with a soldered element:

+ AI recognized that it was a PCB, etc
- However, the AI did not know that something was missing
Reading the time from my DIY clock:

+ AI recognized that it was a clock with time...
- but it didn`t read the time correctly, although it tried, 1 and 5 are correct
Laptop interior:

+ AI recognized that it was a disassembled laptop
+ AI recognized that the disk was missing (not visible)
- AI thought the disc drive was a battery?
- AI sees RAM somewhere here?
XBOX Controller:

+ The AI somehow knew that it was specifically an XBox controller
Mouse interior:

+ AI recognized that it was a mouse, that there was a scroll wheel, etc...
- why does the AI think that this mouse has a red housing? by PCB?
Broken monitor:

+ AI correctly recognized that it was a broken monitor and that it had a broken matrix
- where does AI see the wall socket?
Meme with "fuses":

Quote:
The image shows a collection of tools and items related to repair or maintenance, arranged on a background that resembles an instruction manual or guide. The title at the top reads "GUIDE TO FUSE REPLACEMENT," suggesting that these tools and parts are used for repairing or replacing fuses in various equipment.
The items included in the display are:
A silver metal fuse, which is typically inserted into a fuse panel to protect the electrical circuit from overload.
Screwdrivers of different sizes, which would be used to remove and replace fuses without damaging the surrounding components.
A small bottle or container with a label that says "500ml," though it`s unclear what it contains.
A small metal object that could possibly be a clip or a tool part.
An automotive fuse puller, which is used to extract the fuse from its socket without breaking the circuit.
A plastic tool with an angled head and a notch for gripping the fuse holder in place while removing the fuse.
A small metal clip, which might be a retention clip for holding the fuse in the socket.
A plastic tool designed to press down on the fuse holder to secure the fuse in place after it has been removed from the socket.
An electrical test light, which can be used to visually check if an electrical circuit is open or closed by identifying the illumination of the light bulb when the probe is clipped to the circuit.
A small metal object that appears to be a pin or a clip with a notch on one end, possibly for holding components together.
An electronic component resembling an integrated circuit (IC), which could be used in fuse replacement for controlling electrical circuits within a device.
The items are neatly organized and labeled, indicating that this might be part of an educational display or a DIY repair guide.
+ AI read the text at the top of the image
- The AI didn`t understand that it was a joke...
Disassembled mouse:

+ AI recognized that it was a mouse in parts
Screwdriver length (photo from the web):


+ The AI somehow knew that the total length was 10 inches...
Three screwdrivers:

+ AI recognized that these were three screwdrivers...
Hammer graphics:

+ AI recognized it was a hammer
Graphic measurement:

+ The AI tried to read something, and it was quite successful with "500".
- most of them are read incorrectly
Walkman:

- failure, AI thought it was a laptop
Or maybe without opening it?

+ AI recognized that it was an audio recorder or radio
- AI decided that LCD is visible here...
My BK7231 clock:

+ AI almost read ABCDE, but read it as ABODE, C merged with D
My clock displaying 20:36:

+AI read 2:36
- however, AI lost 0
This is now the temperature reading:

+ AI recognized that there is some temperature here...
- but the reading is heresy, where is 9:30 p.m.? and 12 degrees...
Mouse cable:

- rather a failure, AI forcibly associates USB with mice...
Summary
I must admit that it`s good, especially compared to a few years ago. This model can really recognize a wide range of objects and can even sometimes handle several objects/situations in the frame. Sometimes he can also read the text, although he often distorts it. It`s not as good as closed solutions, but remember that LLaVA is available for download and can run on our machine.
Now the only question is - what to use it for? Maybe a workshop assistant, although it`s probably too early for that? Are we waiting for the 2024/2025 version? I invite you to discuss.
Cool? Ranking DIY Helpful post? Buy me a coffee.