Is Qwen3.5 suitable for photo annotation and OCR? Practical tests on your own computer

p.kaczmarek2 1113 3

TL;DR

Qwen3.5 models were tested locally for photo annotation and OCR on Elektroda forum images, using Ollama and a simple image search engine built from generated tags.
A fixed prompt asked each model for more than 25 generic tags per image, and a script iterated images through the Ollama API to populate a searchable GitHub gallery.
On a laptop, qwen3.5:0.8b took 33.40–126.89 s per image, 2b 39.22–98.43 s, 4b 129.77–394.49 s, and 27b 1152.14–2661.42 s.
The results were better than earlier models, with many correct tags and occasional OCR wins like LN882H, but hallucinations, unstable outputs, and multi-minute runtimes still limit practical large-scale use.

Generated by the language model.

Treść została przetłumaczona

Zobacz oryginalną wersję tematu

Report a violation of the law

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

Post #1

21862765 15 Mar 2026 10:08

Will the Qwen3.5 models allow a simple image search engine to be built by verbal description of the content? Could local AI work in the role of OCR? In the previous topic the AI described more than 1000 images from the Elektroda forum , here I enriched this collection with descriptions from Qwen models 0.8b, 2b, 4b, 9b, 27b, 35b. I have packed the results into a simple search engine. Time to see how it worked.

Let's start with the runtime environment. The whole thing ran just fine on my computer, thanks to the Ollama environment:
https://ollama.com/library
But of course I didn't collect the tags manually, I wrote a simple program what iterates the images and generates tags for them through the Ollama API:
Ollama API Tutorial - chatbots AI 100% locally for use in your own projects

The second important thing is the prompt. I didn't want a prompt strictly for OCR, plus I already had it fixed after the first round of testing, so it couldn't just change. I kept it as it was before:


Choose more than 25 generic tags for this image, sorted from most matching to less matching. Reply just with tags, separated by ;

The question of publication remains - the results are available on GitHub in the form of an image finder. Each image is clickable and its tags can be previewed:
https://openshwprojects.github.io/IndexingElektrodaImages/search2.html
An older version of the site is also available:
https://openshwprojects.github.io/IndexingElektrodaImages/search.html

You can now go on to analyse the results

Results analysis
The description format will be simple - the title of the test, a screenshot from my site, a link to the actual analysis and then my comment.
Board with BK7231N microcontroller

https://openshwprojects.github.io/IndexingEle...azki.elektroda.pl%2F1667439100_1696245190.jpg
Apparently the name is clearly visible, but it is still impressive that already the 0.8b version of the parameters read both the Beken and BK7231N lettering. The crunch, however, comes in the context of the name of the entire board - CB2S. CB2S was only read by model 9b and 2b. Even the 35b version lost it. This somewhat raises doubts about the reliability of these models, as such a Wi-Fi module name is important information for me.
Apart from that, you can see that many of the tags are quite accurate and correct. Obviously the tags from qwen3.5 are not much better than those from gemma3, llava or minicpm:

Screenshot comparing AI tag outputs for a PCB board with BK7231N across multiple models

One can also complain to qwen that some are out of place (NFC tag reader/charger module"), but this happens much less often than in older models.

Packaging of the LED light

https://openshwprojects.github.io/IndexingEle...azki.elektroda.pl%2F3335603200_1739378203.jpg
Here the striking part of the 0.8b model's verbal response is that instead of providing the tags themselves, it started the response with its introduction. The plus side, however, is that the tags are largely all correct, even the weakest models recognised that this was the E27 standard. They probably read from the text.

Atmel chip board

https://openshwprojects.github.io/IndexingEle...azki.elektroda.pl%2F8170563700_1544322617.jpg
I was a little impressed that they decoded the Atmel logo. After all, you can't see it that well. But fact, it was only the larger models they found. The smaller ones were slightly less accurate. For example, the 0.8b in the image detected a potentiometer, which is definitely not there. This ATMEGA1630 of the 2b version is also interesting - there is no such Atmega version after all.

Schematic of amplifier on EL34

https://openshwprojects.github.io/IndexingEle...azki.elektroda.pl%2F8579096100_1606337170.jpg
It was simple, but still good to know that even the 0.8b model read the EL34 name correctly. The rest of the tags are right on target too, there are terms there that mainly just fit the schematics. The two largest models also read 12AU7. Overall, I find it hard to point out anything quite nonsense there.

Switch after flood and hose

https://openshwprojects.github.io/IndexingEle...azki.elektroda.pl%2F4440455200_1744410509.jpg
Now something difficult. In addition, without a test of the larger models. Nevertheless, the switch was recognised by models 2b and 4b. The 0.8b model was also close - "switcher". In addition, it identified it as a "network card" and also saw an ethernet cable somewhere, but this is unlikely to be correct. The cable itself is not in the picture. These "electric scooter" from the 0.8b model is interesting. It illustrates well that, however, there are still quite incorrect hallucinations in unusual situations.

Panel photo

https://openshwprojects.github.io/IndexingEle...troda.pl%2F3738819600_1748852509_bigthumb.jpg
Another good result. Virtually everything matches, except maybe that "battery" from the 0.8b model. Except that he in turn stood out with the customs "man in black shirt", the other models did not identify the shirt.

Build with GPIO descriptions

Here's where something strange happened with qwen3.5 version 0.8b - it generated what looks like an array programming record? The texts were read by gemma3 and a newer version of llava. Interestingly, the larger qwen3.5 also skipped them:

This unfortunately shows that, at least with my prompt, these models are not yet fully reliable, although they still often produce meaningful tags.

OCR screenshot

https://openshwprojects.github.io/IndexingEle...azki.elektroda.pl%2F1417005500_1715987750.png
It worked out quite well. Even the smallest qwen3.5 read the chip name (LN882H) and from the short name the code C25E1088. A bit of a hoot with the NUC and Intel, but that's understandable. The larger the model, however, I see slightly less of this hallucination. The only problem is the section with the 27b version:


screenshot; user interface; configuration; web interface; command line; startup command; embedded system; firmware; settings page; network device; router; text input; submit button; blue button; dark background; software; technology; system administration; IT; computer screen; digital interface; code snippet; shell command; device information; MAC address; chipset; OpenWrt; Linux system; network settings; peripheral drivers

In this case, the name "OpenLN882H" was not even read, which, again, suggests that the models are not working completely stably and perhaps you would need to experiment with the prompt or repeat generations several times and collect the common part of the tags.

Old phone

https://openshwprojects.github.io/IndexingEle...azki.elektroda.pl%2F7651628500_1633732341.jpg
No great comment here - the simpler something is, the easier it is to describe. The models have the same. Qwen3.5 version 4b even wanted to read the Slican name, but it came out SUCCAN brand.

Standard radio

https://openshwprojects.github.io/IndexingEle...azki.elektroda.pl%2F1558884600_1665150103.jpg
A difficult task, so poor results. There are no distinctive elements in the photo, so the 0.8b model recognized it as a toy or model train.... slightly larger models already picked out "stereo receiver" or thereabouts "vintage electronics", so tags would probably be useful anyway.

Tagging performance
A separate issue is how long a single image is described. I generated my data on two machines, an old gaming laptop and a more powerful computer - a workstation. The results so far are from the laptop:

HWiNFO64 System Summary screenshot for ASUS G752VM: i7-6700HQ, GTX 1060, 48 GB RAM.

For a sample of 5-8 images per model:

Model	Minimum time (s)		Max time (s)	Medium time (s)
	qwen3.5:0.8b	33.40	126,89	76.64
qwen3.5:2b	39.22	98,43	61.70
qwen3.5:4b	129.77		<br/394,49	222.36
qwen3.5:9b	303.79	755,80	488.44
qwen3.5:27b	1152.14	2661,42	1864.92

Smaller models generate about a minute, probably with more images more values would separate. The 4b model already requires about 4 minutes on average. 9b twice as long - over 8 minutes. the 27b on this equipment generates an average of 30 minutes per image. Larger ones I have not tested on this laptop.

I will complete the results from a more powerful computer separately.

Summary
Anyone can view the results for themselves, everything is made available, repository too . Feel free to comment.
In my opinion you can see the progress from previous models, qwen3.5 is indeed better. I was somewhat impressed by some of the results, and the fact that even the 0.5b model is able to generate meaningful tags deserves a mention. The problem, on the other hand, is that the results are quite random and the tagging itself, however, takes a bit of time. With a large photo library, I don't see the option of my gaming laptop, for example, creating tags - a few minutes per photo is too much.
Have you also already tested Qwen3.5, how efficiently does it work for you, does it give reasonable results?

Cool? Ranking DIY

Helpful post? Buy me a coffee.

About Author

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

Offline

p.kaczmarek2 wrote 14612 posts with rating 12630, helped 655 times. Been with us since 2014 year.

ADVERTISEMENT
#2 21863330 15 Mar 2026 22:54

Damian_Max Damian_Max

Level 21

Posts: 395

Help: 40

Rate: 96
» | Helpful post? (+1)

Post #2
21863330 15 Mar 2026 22:54

Hey, you can see that this model works pretty well, I wonder how/whether it would work for indexing electronic components....
On another topic https://www.elektroda.pl/rtvforum/topic4159786.html (although it's not the only one), you're building your own storage for parts/things, makes me wonder if, based on photos of the contents of the component drawer, it would be able to catalogue them well enough. Even tentatively prompting him for context (if that's a possibility). I wonder if he would correctly recognise that there are diodes / resistors, relays, switches, etc. in the drawer, and how detailed he would be able to do so. You could also insert two pictures from different perspectives for him (if there is a possibility).
ADVERTISEMENT
#3 21864501 17 Mar 2026 14:43

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

Posts: 14612

Help: 655

Rate: 12630
» | Topic author Helpful post? (0)

Post #3
21864501 17 Mar 2026 14:43

Let's try it, starting with the simplest tasks first. Do you have any pictures of what I can use for testing? For now I have taken these myself:

I will run a tester on them and give the results.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#4 21865081 18 Mar 2026 10:06

oscil1 oscil1

Level 24

Posts: 639

Help: 50

Rate: 176
» | Helpful post? (0)

Post #4
21865081 18 Mar 2026 10:06

p.kaczmarek2 wrote:
ele generates about a minute, probably with more images the values would be more delineated. The 4b already requires about 4 minutes on average. 9b twice as long - over 8 minutes. 27b on this equipment generates an average of 30 minutes n

I wonder how smoothly this would run on some GPU with more reasonable memory and "tensor cores".
Create an account, log in here. You will receive points by participating in discussions.
Join this discussion.

Install Elektroda application

Didn't find an answer? Ask Artificial Intelligence

*I agree to send the question to OpenAI, Anthropic PBC, Perplexity AI, Inc., Kagi Inc., Google LLC - owners of language models in order to prepare the best response. The companies may monitor and log information entered into the form.

*I agree to publicly display my question and answer. The question and answer will be publicly available to everyone. The process may take a few minutes. Upon completion, you will be redirected to the page with the answer.

Wait...(2min)

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

Report a violation of the law

FAQ

TL;DR: Local Qwen3.5 models can tag electronics photos and read visible text, but latency is high (27b averaged 1,864.92 s/image) and results vary; “several minutes per photo is too much,” reports p.kaczmarek2. [Elektroda, p.kaczmarek2, post #21862765]

Why it matters: If you’re building a private, on‑device image search/OCR for hardware labs or forums, these tests show what’s feasible today and where prompts or model sizes need tuning for reliability.

Quick-Facts

Test setup: Ollama local runtime + simple tagger via Ollama API; tags rendered in a GitHub-backed image finder. [Elektroda, p.kaczmarek2, post #21862765]
Prompt used: “Choose more than 25 generic tags…; reply just with tags separated by ;” (unchanged across runs). [Elektroda, p.kaczmarek2, post #21862765]
OCR highlights: Smallest 0.8b read “BK7231N,” “EL34,” and “LN882H”; larger models sometimes added “12AU7.” [Elektroda, p.kaczmarek2, post #21862765]
Failure cases: Model names like “CB2S” and “OpenLN882H” were missed by some larger sizes; 0.8b hallucinated “ethernet cable.” [Elektroda, p.kaczmarek2, post #21862765]
Throughput on laptop: 0.8b ≈ 76.64 s/image avg; 2b ≈ 61.70 s; 4b ≈ 222.36 s; 9b ≈ 488.44 s; 27b ≈ 1,864.92 s. [Elektroda, p.kaczmarek2, post #21862765]

Model tagging time on a gaming laptop (seconds/image)

Model	Min	Max	Avg
qwen3.5:0.8b	33.40	126.89	76.64
qwen3.5:2b	39.22	98.43	61.70
qwen3.5:4b	129.77	394.49	222.36
qwen3.5:9b	303.79	755.80	488.44
qwen3.5:27b	1152.14	2661.42	1864.92

Source: [Elektroda, p.kaczmarek2, post #21862765]

Quick Facts

What is Ollama?

Ollama is a local AI runtime that runs language and vision models on your own computer, exposing a simple API for inference, privacy, and offline use. It powered the Qwen3.5 tagging and OCR tests described in this thread. [Elektroda, p.kaczmarek2, post #21862765]

What is OCR?

OCR is software that converts text visible in images or screenshots into machine‑readable strings, enabling search and analysis; here, OCR refers to Qwen3.5’s ability to read labels like EL34, LN882H, and BK7231N directly from electronics photos. [Elektroda, p.kaczmarek2, post #21862765]

What is GPIO?

GPIO is a general‑purpose input/output interface on microcontrollers that exposes configurable digital pins for reading sensors or driving peripherals; images with printed GPIO legends were included to test whether models extract meaningful tag text. [Elektroda, p.kaczmarek2, post #21862765]

What is EL34?

EL34 is a power pentode vacuum tube used in audio amplifiers; schematic images containing “EL34” and “12AU7” were correctly read as tags by Qwen3.5, including by the 0.8b model for EL34. [Elektroda, p.kaczmarek2, post #21862765]

What is BK7231N?

BK7231N is a Wi‑Fi microcontroller used in IoT modules (e.g., CB2S), where text on the PCB or module shield can be read via OCR; Qwen3.5 models extracted “BK7231N,” though “CB2S” was inconsistently read. [Elektroda, p.kaczmarek2, post #21862765]

Is Qwen3.5 suitable for local photo annotation and OCR?

Yes, for lightweight tagging and reading visible text on electronics images; however, reliability varies by size and prompt, and larger models can still miss short identifiers. The author states “qwen3.5 is indeed better,” but variability persists across runs. [Elektroda, p.kaczmarek2, post #21862765]

Which Qwen3.5 size balances speed and accuracy for electronics tagging?

The 2b model averaged 61.70 s/image and produced generally accurate tags; 4b improved some identifications but rose to 222.36 s/image. Beyond 9b, latency climbs steeply without consistent OCR gains on short strings. [Elektroda, p.kaczmarek2, post #21862765]

How accurate is text reading (OCR) across sizes?

Even 0.8b read “BK7231N,” “EL34,” and “LN882H.” Larger sizes added items like “12AU7,” yet some missed critical tokens such as “CB2S” or “OpenLN882H.” “Repeat generations and collect common tags” mitigates misses. [Elektroda, p.kaczmarek2, post #21862765]

Why did some models miss CB2S or OpenLN882H while reading larger text?

Short identifiers on hardware are sensitive to noise, font, and prompt constraints; the tests showed 9b and 2b caught “CB2S,” but 35b (external comparison) and 27b sometimes didn’t, underscoring instability for tiny strings. [Elektroda, p.kaczmarek2, post #21862765]

How long does tagging take per image on a gaming laptop?

Measured averages were: 0.8b 76.64 s, 2b 61.70 s, 4b 222.36 s, 9b 488.44 s, and 27b 1,864.92 s (~31.1 min). The author concluded, “several minutes per photo is too much.” [Elektroda, p.kaczmarek2, post #21862765]

How do I reproduce this locally with Ollama?

Install Ollama and pull Qwen3.5 variants.
Use the provided prompt to request “>25 tags; semicolon‑separated” via the Ollama API across your image set.
Render clickable images and tags in a simple web page for review. [Elektroda, p.kaczmarek2, post #21862765]

What prompt worked best for bulk tagging?

A fixed, minimal prompt yielded comparable outputs: “Choose more than 25 generic tags for this image, sorted…; reply just with tags; separated by ;.” Consistency allowed cross‑model comparisons without confounding prompt changes. [Elektroda, p.kaczmarek2, post #21862765]

How can I reduce hallucinations on unusual or damaged hardware photos?

Use repeated generations and intersect tags, constrain outputs to tag‑only format, and review short identifiers manually. Edge cases included a flooded switch mis‑tagged with “ethernet cable” and “electric scooter.” [Elektroda, p.kaczmarek2, post #21862765]

Does Qwen3.5 outperform Gemma3, LLaVA, or MiniCPM here?

Qwen3.5 produced fewer out‑of‑place tags than older models in these trials, yet overall tag quality was “not much better” than Gemma3, LLaVA, or MiniCPM; trade‑offs remained image‑dependent. [Elektroda, p.kaczmarek2, post #21862765]

What common failure modes should I expect?

Expect misses on tiny model codes (e.g., CB2S), occasional brand misreads (e.g., “SUCCAN” for Slican), non‑tag preambles, or array‑like outputs from 0.8b, plus GUI screenshots losing niche tokens in 27b. [Elektroda, p.kaczmarek2, post #21862765]

Can Qwen3.5 identify logos like Atmel reliably?

Larger models recognized the Atmel logo on a small PCB; smaller ones were less accurate and even hallucinated components (e.g., a potentiometer). Treat logo detection as probabilistic and review critical IDs manually. [Elektroda, p.kaczmarek2, post #21862765]

What hardware upgrades most improve throughput?

Scaling from 4b (222.36 s) to 9b (488.44 s) and 27b (1,864.92 s) shows compute dominates latency. Faster GPUs/CPUs cut seconds per image, but the thread’s laptop data already shows diminishing returns for accuracy. [Elektroda, p.kaczmarek2, post #21862765]