Vision-based AI models for translating catalogue notes - we test Nano Banana, ChatGPT-Image and othe

p.kaczmarek2 1017 19

Treść została przetłumaczona

Zobacz oryginalną wersję tematu

Report a violation of the law

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

» | Topic author Helpful post? (+3)

Post #1
21795582 01 Jan 2026 15:12

Is a catalogue note in a foreign language a problem in 2026? Today I will test whether artificial intelligence can replace a translator and translate subtitles from electronic parts specifications into English. Importantly, the whole experiment will be based on screenshots (bitmaps) rather than PDF files, so AI will have no way to make its job easier. Will this form of specification translation be of any use? Let's find out!

The testing methodology will be very simple - I take a screenshot of the catalogue note and try to translate it into English by sending it as an attachment with a prompt to the AI. I will use the free LMArena website to run the AI models:
https://lmarena.ai/
After testing, I will try to subjectively evaluate and group the models according to their results.

Note - I have placed the images as generated by the AI. If something is cropped, it means that such a bitmap was created by the model.

Test 1 - constant-current LED controller
Inputs:

translate to english

seedream-4-high-res-fal

Such a grand hallucination rather rules out this model.

reve-v1.1-fast

No translations.

reve-v1.1

Major unnecessary reworking of document, residual translations.

chatgpt-image-latest (20251216)

Slightly better, occasional typos. Almost usable.

gpt-image-1.5

Slightly better, occasional typos. Almost usable.

flux-1-context-pro

No translations.

flux-2-flex

Virtually no translations, except for the title above the document.

flux-2-flex-20251231

Translation failure - random letters and stamps, mostly no translations.

qwen-image-edit

This model hallucinates a strange background and is able to spoil the document. Useless.

flux-2-max

Vestigial translations, most are a meaningless string of letters. The headline translated.

flux-2-pro

Nonsensical strings of letters, useless result.

flux-2-pro-20251231

Nonsense strings of letters, useless result.

gemini-2.5-flash-image-preview (nano-banana)

Surprisingly the elder Banana did not want to translate anything.

gemini-3-pro-image-preview (nano-banana-pro)

Best result to date. Text almost all correct, occasional errors and typos, only in paragraphs are some words nonsense.

Trial 2 - display controller

translate to english
gpt-image-1-mini

Model reworked image, renamed layout, not translated.

gemini-2.5-flash-image-preview (nano-banana)

Elder Banana could not cope with the translation.

flux-2-pro

The model has attempted a translation but the result is unreadable, virtually only the title is helpful - LED Driver/Keyboard Scan.

flux-2-pro-20251231

Flux 2's primary keywords decrypted, but the rest are useless.

flux-1-context-pro

This model has superimposed the inscription translate to English on the image.

flux-2-flex-20251231

Residual translations.

gpt-image-1.5

At first glance very good, but the introduction from the second/third sentence onwards fell apart.

reve-v1.1

The basics translated, but also damaged the lead-in diagram.

seedream-4-high-res-fal

The title may be translated, but the model has added some strange background.

chatgpt-image-latest (20251216)

Like the second one from OpenAI, it's not bad, only the introduction fell apart afterwards. In addition, I see a slightly damaged lead-in diagram.

gemini-3-pro-image-preview (nano-banana-pro)

Nano Banana Pro has again performed very well.

Trial 3 - synchronous rectifier
This time a trial with a screenshot:

translate to english

qwen-image-edit

Useless result.

chatgpt-image-latest (20251216)

The basic translation is there, but with lots of typos. Synchornous?

gpt-image-1.5

Same as the previous GPT.

seedream-4-high-res-fal

This model has redone the background again....

gpt-image-1

Residual translation. In addition, with another attempt I received a strangely cropped image.

gpt-image-1-mini

And here what happened? A short circuit? And this is between two separate diagrams.... in addition, the model also cropped the picture.

flux-2-flex

Again, residual translation.

gemini-2.5-flash-image-preview (nano-banana)

No translation.

seedream-4.5

It came out slightly better this time, but there are still shortcomings.

flux-1-context-pro

No translation.

flux-2-pro

The letters have been changed, but they don't make sense?

gemini-3-pro-image-preview (nano-banana-pro)
bb558ffb3
Another success story for Nano Banana Pro.

Final ranking of video models
I did additional tests, but did not put any more images in the topic, because the content with several of the same nonsense graphics would be unreadable. In the end, I grouped the models according to my overall feeling, although I noticed that occasionally a particular model might do better or worse - probably the generation has some randomness factor (seed - so called).
Right translations, occasional errors:
- gemini-3-pro-image-preview (nano-banana-pro)
Almost acceptable translations, but problems with some words, blurring of letters:
- chatgpt-image-latest (20251216)
- gpt-image-1.5
Sometimes it explains something, sometimes it hallucinates and creates nonsense:
- reve-v1.1
- gpt-image-1
Bare translation attempts, meaningless letter composition:
- flux-2-max
- flux-2-pro
- flux-2-pro-20251231
Something tries to translate, but hallucinates and rearranges images:
- seedream-4-high-res-fal
Can spoil the image:
- qwen-image-edit

In summary , only the latest Nano Banana Pro seems to give acceptable results in terms of translating images from the catalogue notes, although it still happens to have artefacts. Just behind it is still GPT-Image 1.5 and ChatGPT-Image (20251216), but it is no match for it. The rest of the models are useless, although some of them try to remake the image and some ignore the text completely.
There doesn't seem to be much left to do with AI in this context. It seems to me that as early as 2026 there will be much better models that can handle such translations even better, and even if not, the Nano Banana Pro is still satisfactory.
Do you see a use for artificial intelligence in the role of an image translator? Or do you know of other practical applications for the Nano Banana Pro and similar models?

Cool? Ranking DIY
Helpful post? Buy me a coffee.
About Author
p.kaczmarek2 p.kaczmarek2

Moderator Smart Home
Offline

Joined: 26 Dec 2014

Posts: 14052

Help: 637

Posts rating: 11874

Points: 135974
p.kaczmarek2 wrote 14052 posts with rating 11874, helped 637 times. Been with us since 2014 year.
ADVERTISEMENT
#2 21796137 02 Jan 2026 09:02

fachman1964 fachman1964

Level 5

» | Helpful post? (0)

Post #2
21796137 02 Jan 2026 09:02

And so very good for a machine. It will still take some time before it reaches perfection. Nevertheless, you can read the information you need from such translations, definitely better than Chinese "bushes". It has always puzzled me why some Chinese manufacturers do not immediately make datasheets available in two languages. Maybe such chips are not intended for the external market but the Chinese market? Because I don't think they don't speak English.
#3 21796167 02 Jan 2026 09:53

szeryf3 szeryf3

Level 30

» | Helpful post? (0)

Post #3
21796167 02 Jan 2026 09:53

Artificial intelligence is learning by the day and I suspect that by the end of this year there will be a visible difference in this subject.
It wasn't so long ago that this was black magic, and now peasants don't grasp many things without sending a query to the AI.
ADVERTISEMENT
#4 21796237 02 Jan 2026 11:16

MikeC MikeC

Level 32

» | Helpful post? (+1)

Post #4
21796237 02 Jan 2026 11:16

Mi chatgpt 5.2 still did things differently:

And this one for English and pseudo Polish:
#5 21796292 02 Jan 2026 12:06

gulson gulson

System Administrator

» | Helpful post? (0)

Post #5
21796292 02 Jan 2026 12:06

Nano banana the best, as usual.

With hundreds of pages, however, it is best to do OCR from such a document, i.e. get the Chinese, and then translate yourself with the language model, already without vision.
Of course, the text will be clean and not very arranged, but the efficiency is very high.
It is also possible to first use a model, which will split the PDF page into images, tables and text and insert them separately into the models for translation.
This is also the cheapest solution for many pages.

Probably in the future to translate documents, as nano banana did above, a lot of computing power will be needed -- because it's actually generating the whole page anew.
#6 21796336 02 Jan 2026 12:53

Mateusz_konstruktor Mateusz_konstruktor

Level 37

» | Helpful post? (0)

Post #6
21796336 02 Jan 2026 12:53

gulson wrote:
Probably in the future there will be translation of documents, as nano banana did above, a lot of computing power will be needed -- because it is actually generating the whole page anew

In my opinion, artificial intelligence will also lead to a standard in China for the use of English in electronic component documentation. By a circuitous route, but nevertheless this is the aspect I see most in this whole subject. There will be some rationalisation, although not in all cases. Quite simply, descriptions in English are much more useful when dealing with manufacturers, and usually the Chinese language is a huge impediment.
#7 21796376 02 Jan 2026 13:18

gulson gulson

System Administrator

» | Helpful post? (0)

Post #7
21796376 02 Jan 2026 13:18

The idea of a global language has been around for quite a long time, so far nothing has changed. The indicated documentation can be released in English in parallel, now they got a tool where it is done more simply.
Or maybe they will even be translated "on the fly" ?
#8 21796455 02 Jan 2026 14:13

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (+1)

Post #8
21796455 02 Jan 2026 14:13

@fachman1964 my sense is that with them there is often vestigial and untranslated documentation, even when they make it available. In the SDK for Beken and other IoT chips it is similar.

@MikeC I think your results are slightly better than what I had. Could it be that it's a different model than on LMAren?

@gulson so more specifically it's the Nano Banana Pro - the version without the Pro is weak.

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
ADVERTISEMENT
#9 21796459 02 Jan 2026 14:17

Mateusz_konstruktor Mateusz_konstruktor

Level 37

» | Helpful post? (0)

Post #9
21796459 02 Jan 2026 14:17

@gulson
It is not a global language, but the equivalent of a technical drawing, i.e. a language that is universal in its assumptions and understood by everyone regardless of the mother tongue used.
English was and is used as an international language, but without the global attribute.
In my opinion, there will be, although probably in a way that will be difficult for the average end-user to notice, a shift by many Chinese manufacturers to producing documentation in English instead of Chinese. Artificial intelligence will lead to pressure and de facto force a significant proportion of English. In contrast, this will happen as a result of activity from a completely separate area: marketing.

There is already voice-to-text conversion, image search and online translation.
This appears to be the natural order of things.
The question: will it again take a dozen or more years?
#10 21796468 02 Jan 2026 14:23

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (+1)

Post #10
21796468 02 Jan 2026 14:23

What I am wondering is what is the actual cost of such a reworking of one image by Nano Banana Pro. Say, this situation from my presentation. Has anyone seen such information somewhere? A quick web search showed me this post:

https://www.reddit.com/r/Bard/comments/1p7qel..._banana_pro_api_pricing_complete_breakdown_8/

So much for the cost of the API (with us), and we don't know how much Google actually costs it, and how much Google adds to itself to have a profit....

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#11 21796510 02 Jan 2026 14:58

Mateusz_konstruktor Mateusz_konstruktor

Level 37

» | Helpful post? (0)

Post #11
21796510 02 Jan 2026 14:58

@p.kaczmarek2
Any rates, even those actually paid, are not authoritative.
Here, the decisive factor is something other than the actual cost.
The end-user price is the result of an activity that has as its objectives the resultant of many objectives and considerations, with particular emphasis on the factors generated by those who own the tools. The situation is analogous to the price of the proverbial bread available on the shop shelf. At first glance we have the conclusion: "after all, they have to make money on it". However, the sale of bread is not only unprofitable, it can be an additional cost. On top of that, it can be a cost that is openly and deliberately created, for example to achieve a business plan when selling cheese. Profit itself may not be the objective, nor may it necessarily be in the field of interest. While it may be interesting to discuss offer prices, it is important to distinguish between the two.
#12 21796619 02 Jan 2026 16:45

PPK PPK

Level 30

» | Helpful post? (+1)

Post #12
21796619 02 Jan 2026 16:45

Something seems to me that in the case of Asian 'bushes', AI rather searches on the manufacturer/other sites for a ready-made English translation...Mainly the complete ones....
#13 21796641 02 Jan 2026 17:11

MikeC MikeC

Level 32

» | Helpful post? (0)

Post #13
21796641 02 Jan 2026 17:11

PPK wrote:
Something seems to me that in the case of Asian "bushes", AI rather searches on the manufacturer's/others' websites for a ready-made English translation...Mainly of the complete ones...

It translates from these bushes without any problem ...
ADVERTISEMENT
#14 21800106 05 Jan 2026 23:33

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (+1)

Post #14
21800106 05 Jan 2026 23:33

Test with Polish.
translate all to polish

gpt-image-1.5

gemini-3-pro-image-preview-2k (nano-banana-pro)

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#15 21800127 06 Jan 2026 00:26

Mateusz_konstruktor Mateusz_konstruktor

Level 37

» | Helpful post? (0)

Post #15
21800127 06 Jan 2026 00:26

Why do images two and three have the left and right sections cut off?
#16 21800141 06 Jan 2026 01:04

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (+1)

Post #16
21800141 06 Jan 2026 01:04

All in all, I think we've talked about this before, but this is how the AI generates. Below is a short video of what it looks like to me:

gpt-image-1.5 cuts off images for me always, no matter if I download or copy.

Do you also experience this problem?

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#17 21800166 06 Jan 2026 02:54

Mateusz_konstruktor Mateusz_konstruktor

Level 37

» | Helpful post? (0)

Post #17
21800166 06 Jan 2026 02:54

p.kaczmarek2 wrote:
Are you also experiencing this problem?

I encounter such a problem in cases of incompatible settings or incompatible web browsers themselves.
Parts of web pages sometimes get cut off as a result, and this happens especially with unusual settings.
Can a colleague provide the address of a website where this can be verified?
#18 21800340 06 Jan 2026 10:51

p.kaczmarek2 p.kaczmarek2

Moderator Smart Home

» | Topic author Helpful post? (+1)

Post #18
21800340 06 Jan 2026 10:51

Mateusz_konstruktor wrote:

Colleague will you provide the website address where the above can be verified?

The website address is the same all the time, as in the first paragraph:
p.kaczmarek2 wrote:
into English by sending it as an attachment with the prompt to the AI. I will use the free LMArena website to run the AI models:
https://lmarena.ai/

Mateusz_konstruktor wrote:

I encounter such a problem in cases of incompatible settings or incompatible web browsers themselves.
Parts of web pages sometimes get cut off as a result, and this happens especially with unusual settings.

Somewhere in the fifteen years I've been doing the frontend and backend, and I haven't encountered the browser truncating the downloaded (source) image, how could that work? The browser does not do a discrete crop operation on the resource for which it sends a GET request. Clipped images can only be on the web page itself (e.g. img tags), but then "open target element" will still show the unclipped image, unless it's the backend that's already sending it clipped.

To clarify - I'm talking about the Response field from the GET request:

The problem you write about could exist if I did a "print screen" instead of saving the image, but.... after all, that would even be more clicking than a simple "Save as".

I am creating multiplatform open source firmware (Tasmota replacement), right now supporting BK7231T, BK7231N, XR809, BL602, W800, W600, LN882H and soon supporting RTL and W701:
https://github.com/openshwprojects/OpenBK7231T
If you like my work, support me at: https://paypal.me/openshwprojects

Helpful post? Buy me a coffee.
#19 21800409 06 Jan 2026 11:47

Mateusz_konstruktor Mateusz_konstruktor

Level 37

» | Helpful post? (0)

Post #19
21800409 06 Jan 2026 11:47

p.kaczmarek2 wrote:
I've been doing frontend and backend for fifteen years and I haven't encountered a browser clipping a downloaded (source) image, how could that work?

This could be linked to the frame size of the item depending on the screen size and then automatically adjusting the dynamically generated image. Sometimes a slightly unusual web browser or some setting is sufficient. I have encountered such a phenomenon a few times on the websites of electronic component manufacturers from completely exotic regions for us.

And isn't there, in this "free" variant, a limit on the size of the images?
Maybe exceeding the limit causes this clipping?
How about trying it with the same image, but saved at a lower resolution?
#20 21800480 06 Jan 2026 12:35

willyvmm willyvmm

Level 31

» | Helpful post? (0)

Post #20
21800480 06 Jan 2026 12:35

If the attached images are exactly what AI got, then I'm not surprised.
Shit in => Shit out.
Create an account, log in here. You will receive points by participating in discussions.
Join this discussion.

Install Elektroda application

Didn't find an answer? Ask Artificial Intelligence

*I agree to send the question to OpenAI, Anthropic PBC, Perplexity AI, Inc., Kagi Inc., Google LLC - owners of language models in order to prepare the best response. The companies may monitor and log information entered into the form.

*I agree to publicly display my question and answer. The question and answer will be publicly available to everyone. The process may take a few minutes. Upon completion, you will be redirected to the page with the answer.

Wait...(2min)

Reply Cool? Ranking DIY | New topic

Notify about new articles

📢 Listen (AI):

Report a violation of the law

FAQ

TL;DR: In 3/3 trials, Nano Banana Pro led; “only the latest Nano Banana Pro seems to give acceptable results,” reports p.kaczmarek2. [Elektroda, p.kaczmarek2, post #21795582] Why it matters: Engineers can quickly read non‑English datasheet screenshots without manual OCR, saving evaluation time.

Test scope: 3 catalogue‑note screenshots translated via image prompts on LMArena; outputs judged for accuracy and artifacts. [Elektroda, p.kaczmarek2, post #21795582]
Top performer: gemini‑3‑pro‑image‑preview (“Nano Banana Pro”) produced mostly correct text with minor typos. [Elektroda, p.kaczmarek2, post #21795582]
Runner‑ups: GPT‑Image 1.5 and ChatGPT‑Image (20251216) were “almost usable,” but weaker on long paragraphs. [Elektroda, p.kaczmarek2, post #21795582]
Common failures: Flux variants often output meaningless letters; some models cropped pages or added backgrounds. [Elektroda, p.kaczmarek2, post #21795582]
Practical takeaway: Screenshot‑to‑English is feasible today, yet artifacts persist; expect rapid model gains through 2026. [Elektroda, szeryf3, post #21796167]

Quick Facts

- Test scope: 3 catalogue‑note screenshots translated via image prompts on LMArena; outputs judged for accuracy and artifacts. [Elektroda, p.kaczmarek2, post #21795582]
- Top performer: gemini‑3‑pro‑image‑preview (“Nano Banana Pro”) produced mostly correct text with minor typos. [Elektroda, p.kaczmarek2, post #21795582]
- Runner‑ups: GPT‑Image 1.5 and ChatGPT‑Image (20251216) were “almost usable,” but weaker on long paragraphs. [Elektroda, p.kaczmarek2, post #21795582]
- Common failures: Flux variants often output meaningless letters; some models cropped pages or added backgrounds. [Elektroda, p.kaczmarek2, post #21795582]
- Practical takeaway: Screenshot‑to‑English is feasible today, yet artifacts persist; expect rapid model gains through 2026. [Elektroda, szeryf3, post #21796167]

What’s the best vision model right now for translating datasheet screenshots?

Nano Banana Pro (gemini‑3‑pro‑image‑preview) ranked first across three trials. It delivered mostly correct text with minor errors. The author concludes it’s the only model with acceptable results today. “Only the latest Nano Banana Pro seems to give acceptable results.” [Elektroda, p.kaczmarek2, post #21795582]

Are GPT‑Image models good enough for production use?

GPT‑Image 1.5 and ChatGPT‑Image (20251216) were almost acceptable. They handled titles and many labels but degraded in longer paragraphs. Expect typos and occasional diagram damage. Use them for quick triage, not final documentation. [Elektroda, p.kaczmarek2, post #21795582]

Why do some results look like nonsense letters or broken pages?

Several models, especially Flux variants, produced meaningless letter strings. Others hallucinated new backgrounds, overlaid prompts, or cropped images. These are typical vision‑OCR failure modes when text is dense or stylized. [Elektroda, p.kaczmarek2, post #21795582]

Can I rely on AI to extract key specs from non‑English datasheets?

Yes, for quick reading of essentials. The thread shows usable summaries from top models for LED drivers, displays, and rectifiers. You still must verify numbers and units before design decisions. [Elektroda, p.kaczmarek2, post #21795582]

How do I translate a catalogue screenshot step‑by‑step?

Capture a full‑page screenshot of the datasheet section.
Upload it to LMArena (or similar) and prompt: “Translate to English; keep layout.”
Review output and manually correct units and diagrams. [Elektroda, p.kaczmarek2, post #21795582]

Why don’t some Chinese manufacturers ship English datasheets?

One user notes many are readable via AI, yet wonders about missing bilingual PDFs. Possible reasons include domestic focus and resource limits. Use AI as a stopgap to access needed info. [Elektroda, fachman1964, post #21796137]

Will translation quality improve soon?

Community expectation is strong. As one poster said, progress is daily and visible improvements should arrive by year’s end. “There will be a visible difference.” [Elektroda, szeryf3, post #21796167]

Why do two runs of the same model produce different results?

Generations include randomness (seed). The author saw runs vary, with models sometimes better or worse. Repeat runs can help, but validate each output. [Elektroda, p.kaczmarek2, post #21795582]

Which failures should I watch for before trusting a translation?

Edge cases include overlaid prompt text, cropped diagrams, short‑circuited graphics between figures, and typos like “Synchornous.” Always compare to the original image. [Elektroda, p.kaczmarek2, post #21795582]

Is LMArena required, or can I use other platforms?

The tests used LMArena to run many models. Any platform that accepts image prompts and returns edited images or overlays can work similarly. [Elektroda, p.kaczmarek2, post #21795582]

What models should I avoid for screenshot translation today?

Avoid Flux‑2‑Max/Pro variants for this task; results were meaningless letters. Also avoid qwen‑image‑edit and seedream‑4‑high‑res‑fal due to image damage. [Elektroda, p.kaczmarek2, post #21795582]

Do multiple models agree on the same page?

Not always. One user showed ChatGPT 5.2 producing different stylistic outputs, confirming variability between and within models. Cross‑check critical lines. [Elektroda, MikeC, post #21796237]

What’s a realistic success rate based on this thread?

One model ranked as “acceptable,” two as “almost acceptable,” and several failed. That’s roughly 30% usable among highlighted models in the ranking. Treat as guidance, not a benchmark. [Elektroda, p.kaczmarek2, post #21795582]

Where does AI help most with catalogue notes?

Great for quick headline translation, pin/function labels, and block‑diagram captions. Less reliable for long prose and dense tables. Use human review for specs. [Elektroda, p.kaczmarek2, post #21795582]

Any expert takeaway I can act on today?

Start with Nano Banana Pro for screenshots. If it struggles, try GPT‑Image 1.5, then ChatGPT‑Image (20251216). Always verify critical numbers. [Elektroda, p.kaczmarek2, post #21795582]

What prompt helps preserve layout?

Use concise directives: “Translate to English; keep layout; don’t add backgrounds; keep diagrams intact.” The author warns that some models rearrange pages, so constrain edits. [Elektroda, p.kaczmarek2, post #21795582]

Vision-based AI models for translating catalogue notes - we test Nano Banana, ChatGPT-Image and othe

Didn't find an answer? Ask Artificial Intelligence