logo elektroda
logo elektroda
X
logo elektroda

Vision-based AI models for translating catalogue notes - we test Nano Banana, ChatGPT-Image and othe

p.kaczmarek2 1257 19

TL;DR

  • Several vision-based AI models were tested on translating foreign-language catalogue-note screenshots of electronic part specifications into English.
  • The method used bitmap screenshots, not PDFs, sent as image attachments through LMArena with the prompt “translate to english”.
  • One benchmark used chatgpt-image-latest (20251216), gpt-image-1.5, flux-2-pro, seedream-4-high-res-fal, and gemini-3-pro-image-preview (nano-banana-pro).
  • gemini-3-pro-image-preview (nano-banana-pro) produced the best results, with text almost all correct and only occasional errors and typos.
  • Most other models were useless, often hallucinating backgrounds, cropping images, or outputting meaningless letters instead of translations.
Generated by the language model.
ADVERTISEMENT
Treść została przetłumaczona polish » english Zobacz oryginalną wersję tematu
📢 Listen (AI):
  • Comparison of two screenshots translating L1050 datasheet using different AI models
    Is a catalogue note in a foreign language a problem in 2026? Today I will test whether artificial intelligence can replace a translator and translate subtitles from electronic parts specifications into English. Importantly, the whole experiment will be based on screenshots (bitmaps) rather than PDF files, so AI will have no way to make its job easier. Will this form of specification translation be of any use? Let's find out!

    The testing methodology will be very simple - I take a screenshot of the catalogue note and try to translate it into English by sending it as an attachment with a prompt to the AI. I will use the free LMArena website to run the AI models:
    https://lmarena.ai/
    After testing, I will try to subjectively evaluate and group the models according to their results.

    Note - I have placed the images as generated by the AI. If something is cropped, it means that such a bitmap was created by the model.

    Test 1 - constant-current LED controller
    Inputs:
    Chinese L1050 datasheet page with description, features, and pin layout
    translate to english



    seedream-4-high-res-fal
    L1050 IC on blue datasheet background with technical description and pin annotation
    Such a grand hallucination rather rules out this model.

    reve-v1.1-fast
    Screenshot of a Chinese LED driver datasheet for model L1050 with feature list and pinout diagram
    No translations.

    reve-v1.1
    L1050 datasheet page with tables, description, and IC pinout diagram
    Major unnecessary reworking of document, residual translations.

    chatgpt-image-latest (20251216)
    Screenshot of LED driver L1050 datasheet with features and pin diagram
    Slightly better, occasional typos. Almost usable.

    gpt-image-1.5
    L1050 LED driver datasheet with text errors and pinout diagram
    Slightly better, occasional typos. Almost usable.

    flux-1-context-pro
    Screenshot of a Chinese datasheet for L1050 IC in SOP-16 package
    No translations.

    flux-2-flex
    Screenshot of L1050 datasheet with description, features list, and 16-pin package diagram.
    Virtually no translations, except for the title above the document.

    flux-2-flex-20251231
    Screenshot of a datasheet for the L1050 LED driver chip
    Translation failure - random letters and stamps, mostly no translations.

    qwen-image-edit
    Mosaic of small pastel-colored squares in turquoise, pink, yellow, and blue tones
    Screenshot of the L1050 datasheet with Chinese text and a pin configuration diagram.
    This model hallucinates a strange background and is able to spoil the document. Useless.


    flux-2-max
    Screenshot of L1050 LED driver datasheet with English and Chinese text sections
    Vestigial translations, most are a meaningless string of letters. The headline translated.

    flux-2-pro
    Screenshot of L1050 datasheet showing description, features, ordering, and marking info
    Nonsensical strings of letters, useless result.

    flux-2-pro-20251231
    Screenshot of L1050 datasheet with garbled English and Chinese text and SOP-16 pin layout
    Screenshot of L1050 datasheet with mistranslated English text on a blue background.
    Nonsense strings of letters, useless result.

    gemini-2.5-flash-image-preview (nano-banana)
    Screenshot of L1050 LED driver datasheet with description, features, and SOP-16 package diagram
    Surprisingly the elder Banana did not want to translate anything.

    gemini-3-pro-image-preview (nano-banana-pro)
    L1050 LED IC datasheet with description, features, and SOP-16 pinout diagram
    Best result to date. Text almost all correct, occasional errors and typos, only in paragraphs are some words nonsense.

    Trial 2 - display controller
    FD650 controller diagram with LED and keyboard interface specifications and pin layout
    translate to english
    gpt-image-1-mini
    Screenshot of FD650 datasheet with Chinese text and a pinout diagram labeled FD-950
    Model reworked image, renamed layout, not translated.

    gemini-2.5-flash-image-preview (nano-banana)
    Diagram and description of the FD650 chip functions in Chinese.
    Elder Banana could not cope with the translation.

    flux-2-pro
    Datasheet page for FD650 IC showing features list and 16-pin diagram.
    The model has attempted a translation but the result is unreadable, virtually only the title is helpful - LED Driver/Keyboard Scan.

    flux-2-pro-20251231
    Screenshot of FD650 datasheet with garbled English text and pinout diagram
    Flux 2's primary keywords decrypted, but the rest are useless.


    flux-1-context-pro
    Screenshot of FD650 datasheet with Chinese text and pinout diagram labeled “Translate to english”
    This model has superimposed the inscription translate to English on the image.



    flux-2-flex-20251231
    FD650 datasheet excerpt with block description and pinout diagram
    Residual translations.

    gpt-image-1.5
    FD650 datasheet page with IC overview, features list, and pinout diagram
    At first glance very good, but the introduction from the second/third sentence onwards fell apart.

    reve-v1.1
    FD650 datasheet excerpt with features list and pin diagram
    The basics translated, but also damaged the lead-in diagram.

    seedream-4-high-res-fal
    Screenshot of FD650 datasheet showing functions and a pinout diagram
    The title may be translated, but the model has added some strange background.

    chatgpt-image-latest (20251216)
    Screenshot of FD650 datasheet with device features and pinout diagram
    Like the second one from OpenAI, it's not bad, only the introduction fell apart afterwards. In addition, I see a slightly damaged lead-in diagram.

    gemini-3-pro-image-preview (nano-banana-pro)
    FD650 datasheet page showing description, features, and pinout diagram
    Nano Banana Pro has again performed very well.

    Trial 3 - synchronous rectifier
    This time a trial with a screenshot:
    Circuit diagram and Chinese-language description of MT6706BL synchronous rectifier
    translate to english

    qwen-image-edit
    Synchronous rectifier circuit diagram with MT6706L and MT6706BL ICs
    Useless result.

    chatgpt-image-latest (20251216)
    Schematic with MT6706BL chip and description of flyback synchronous controller operation
    The basic translation is there, but with lots of typos. Synchornous?

    gpt-image-1.5
    MT6706BL circuit diagram with incorrect OCR-translated English technical text
    Same as the previous GPT.

    seedream-4-high-res-fal
    Rectifier diagram with MT6706BL IC and distorted section of translated specification
    This model has redone the background again....

    gpt-image-1
    Screenshot of MT6760BL datasheet showing descriptive text and circuit diagrams.
    Fragment of datasheet with technical text and MT3706BL block diagrams
    Residual translation. In addition, with another attempt I received a strangely cropped image.

    gpt-image-1-mini
    Diagram of two MT6706BL circuits with bridge rectifiers and capacitors
    And here what happened? A short circuit? And this is between two separate diagrams.... in addition, the model also cropped the picture.

    flux-2-flex
    Flyback power supply schematic with MT6706BL controller and technical description text.
    Again, residual translation.

    gemini-2.5-flash-image-preview (nano-banana)
    Rectifier circuit diagrams using MT6706BL with Chinese text explanation.
    No translation.

    seedream-4.5
    Application diagram for MT6706BL synchronous rectifier with technical description in English.
    Application circuit diagram of MT6706BL used in synchronous rectification
    It came out slightly better this time, but there are still shortcomings.

    flux-1-context-pro
    Block diagram with MT6706BL IC and Chinese-language functional description
    No translation.

    flux-2-pro
    Rectifier schematics with MT6706BL chip and distorted English title and paragraph
    The letters have been changed, but they don't make sense?

    gemini-3-pro-image-preview (nano-banana-pro)
    bb558ffb3
    Another success story for Nano Banana Pro.


    Final ranking of video models
    I did additional tests, but did not put any more images in the topic, because the content with several of the same nonsense graphics would be unreadable. In the end, I grouped the models according to my overall feeling, although I noticed that occasionally a particular model might do better or worse - probably the generation has some randomness factor (seed - so called).
    Right translations, occasional errors:
    - gemini-3-pro-image-preview (nano-banana-pro)
    Almost acceptable translations, but problems with some words, blurring of letters:
    - chatgpt-image-latest (20251216)
    - gpt-image-1.5
    Sometimes it explains something, sometimes it hallucinates and creates nonsense:
    - reve-v1.1
    - gpt-image-1
    Bare translation attempts, meaningless letter composition:
    - flux-2-max
    - flux-2-pro
    - flux-2-pro-20251231
    Something tries to translate, but hallucinates and rearranges images:
    - seedream-4-high-res-fal
    Can spoil the image:
    - qwen-image-edit

    In summary , only the latest Nano Banana Pro seems to give acceptable results in terms of translating images from the catalogue notes, although it still happens to have artefacts. Just behind it is still GPT-Image 1.5 and ChatGPT-Image (20251216), but it is no match for it. The rest of the models are useless, although some of them try to remake the image and some ignore the text completely.
    There doesn't seem to be much left to do with AI in this context. It seems to me that as early as 2026 there will be much better models that can handle such translations even better, and even if not, the Nano Banana Pro is still satisfactory.
    Do you see a use for artificial intelligence in the role of an image translator? Or do you know of other practical applications for the Nano Banana Pro and similar models?

    Cool? Ranking DIY
    Helpful post? Buy me a coffee.
    About Author
    p.kaczmarek2
    Moderator Smart Home
    Offline 
    p.kaczmarek2 wrote 14515 posts with rating 12518, helped 651 times. Been with us since 2014 year.
  • ADVERTISEMENT
  • #2 21796137
    fachman1964
    Level 5  
    Posts: 382
    Rate: 59
    And so very good for a machine. It will still take some time before it reaches perfection. Nevertheless, you can read the information you need from such translations, definitely better than Chinese "bushes". It has always puzzled me why some Chinese manufacturers do not immediately make datasheets available in two languages. Maybe such chips are not intended for the external market but the Chinese market? Because I don't think they don't speak English.
  • #3 21796167
    szeryf3
    Level 30  
    Posts: 2046
    Help: 12
    Rate: 672
    Artificial intelligence is learning by the day and I suspect that by the end of this year there will be a visible difference in this subject.
    It wasn't so long ago that this was black magic, and now peasants don't grasp many things without sending a query to the AI.
  • #4 21796237
    MikeC
    Level 32  
    Posts: 1391
    Help: 201
    Rate: 969
    Mi chatgpt 5.2 still did things differently:
    Technical flyer of L1050 LED driver IC with description, features, and pin configuration

    And this one for English and pseudo Polish:

    Application diagrams using MT6706BL controller in flyback rectifier circuits
    Application diagram of MT6706BL controller with bridge rectifier and DC output
  • #5 21796292
    gulson
    System Administrator
    Posts: 29313
    Help: 148
    Rate: 6016
    Nano banana the best, as usual.

    With hundreds of pages, however, it is best to do OCR from such a document, i.e. get the Chinese, and then translate yourself with the language model, already without vision.
    Of course, the text will be clean and not very arranged, but the efficiency is very high.
    It is also possible to first use a model, which will split the PDF page into images, tables and text and insert them separately into the models for translation.
    This is also the cheapest solution for many pages.

    Probably in the future to translate documents, as nano banana did above, a lot of computing power will be needed -- because it's actually generating the whole page anew.
  • ADVERTISEMENT
  • #6 21796336
    Mateusz_konstruktor
    Level 37  
    Posts: 4179
    Help: 267
    Rate: 1102
    gulson wrote:
    Probably in the future there will be translation of documents, as nano banana did above, a lot of computing power will be needed -- because it is actually generating the whole page anew

    In my opinion, artificial intelligence will also lead to a standard in China for the use of English in electronic component documentation. By a circuitous route, but nevertheless this is the aspect I see most in this whole subject. There will be some rationalisation, although not in all cases. Quite simply, descriptions in English are much more useful when dealing with manufacturers, and usually the Chinese language is a huge impediment.
  • ADVERTISEMENT
  • #7 21796376
    gulson
    System Administrator
    Posts: 29313
    Help: 148
    Rate: 6016
    The idea of a global language has been around for quite a long time, so far nothing has changed. The indicated documentation can be released in English in parallel, now they got a tool where it is done more simply.
    Or maybe they will even be translated "on the fly" ?
  • #8 21796455
    p.kaczmarek2
    Moderator Smart Home
    Posts: 14515
    Help: 651
    Rate: 12518
    @fachman1964 my sense is that with them there is often vestigial and untranslated documentation, even when they make it available. In the SDK for Beken and other IoT chips it is similar.

    @MikeC I think your results are slightly better than what I had. Could it be that it's a different model than on LMAren?

    @gulson so more specifically it's the Nano Banana Pro - the version without the Pro is weak.
    Helpful post? Buy me a coffee.
  • #9 21796459
    Mateusz_konstruktor
    Level 37  
    Posts: 4179
    Help: 267
    Rate: 1102
    @gulson
    It is not a global language, but the equivalent of a technical drawing, i.e. a language that is universal in its assumptions and understood by everyone regardless of the mother tongue used.
    English was and is used as an international language, but without the global attribute.
    In my opinion, there will be, although probably in a way that will be difficult for the average end-user to notice, a shift by many Chinese manufacturers to producing documentation in English instead of Chinese. Artificial intelligence will lead to pressure and de facto force a significant proportion of English. In contrast, this will happen as a result of activity from a completely separate area: marketing.

    There is already voice-to-text conversion, image search and online translation.
    This appears to be the natural order of things.
    The question: will it again take a dozen or more years?
  • #10 21796468
    p.kaczmarek2
    Moderator Smart Home
    Posts: 14515
    Help: 651
    Rate: 12518
    What I am wondering is what is the actual cost of such a reworking of one image by Nano Banana Pro. Say, this situation from my presentation. Has anyone seen such information somewhere? A quick web search showed me this post:
    Screenshot of Reddit post showing Nano Banana Pro API pricing and comparison with other image APIs
    https://www.reddit.com/r/Bard/comments/1p7qel..._banana_pro_api_pricing_complete_breakdown_8/

    So much for the cost of the API (with us), and we don't know how much Google actually costs it, and how much Google adds to itself to have a profit....
    Helpful post? Buy me a coffee.
  • #11 21796510
    Mateusz_konstruktor
    Level 37  
    Posts: 4179
    Help: 267
    Rate: 1102
    @p.kaczmarek2
    Any rates, even those actually paid, are not authoritative.
    Here, the decisive factor is something other than the actual cost.
    The end-user price is the result of an activity that has as its objectives the resultant of many objectives and considerations, with particular emphasis on the factors generated by those who own the tools. The situation is analogous to the price of the proverbial bread available on the shop shelf. At first glance we have the conclusion: "after all, they have to make money on it". However, the sale of bread is not only unprofitable, it can be an additional cost. On top of that, it can be a cost that is openly and deliberately created, for example to achieve a business plan when selling cheese. Profit itself may not be the objective, nor may it necessarily be in the field of interest. While it may be interesting to discuss offer prices, it is important to distinguish between the two.
  • #12 21796619
    PPK
    Level 30  
    Posts: 1938
    Help: 94
    Rate: 422
    Something seems to me that in the case of Asian 'bushes', AI rather searches on the manufacturer/other sites for a ready-made English translation...Mainly the complete ones....
  • ADVERTISEMENT
  • #13 21796641
    MikeC
    Level 32  
    Posts: 1391
    Help: 201
    Rate: 969
    PPK wrote:
    Something seems to me that in the case of Asian "bushes", AI rather searches on the manufacturer's/others' websites for a ready-made English translation...Mainly of the complete ones...

    It translates from these bushes without any problem ...
  • #14 21800106
    p.kaczmarek2
    Moderator Smart Home
    Posts: 14515
    Help: 651
    Rate: 12518
    Test with Polish.
    translate all to polish
    L1050 chip datasheet showing features, description, ordering info, and SOP-16 layout

    gpt-image-1.5
    Technical datasheet for L1050 LED driver with features, description, and package diagram
    L1050 LED driver datasheet with features, order info, and pinout diagram on blue background

    gemini-3-pro-image-preview-2k (nano-banana-pro)
    Diagram of L1050 IC with technical data and feature summary
    Diagram and specification sheet of LED driver L1050 with technical and ordering info
    Helpful post? Buy me a coffee.
  • #15 21800127
    Mateusz_konstruktor
    Level 37  
    Posts: 4179
    Help: 267
    Rate: 1102
    Why do images two and three have the left and right sections cut off?
  • #16 21800141
    p.kaczmarek2
    Moderator Smart Home
    Posts: 14515
    Help: 651
    Rate: 12518
    All in all, I think we've talked about this before, but this is how the AI generates. Below is a short video of what it looks like to me:



    gpt-image-1.5 cuts off images for me always, no matter if I download or copy.

    Do you also experience this problem?
    Helpful post? Buy me a coffee.
  • #17 21800166
    Mateusz_konstruktor
    Level 37  
    Posts: 4179
    Help: 267
    Rate: 1102
    p.kaczmarek2 wrote:
    Are you also experiencing this problem?

    I encounter such a problem in cases of incompatible settings or incompatible web browsers themselves.
    Parts of web pages sometimes get cut off as a result, and this happens especially with unusual settings.
    Can a colleague provide the address of a website where this can be verified?
  • #18 21800340
    p.kaczmarek2
    Moderator Smart Home
    Posts: 14515
    Help: 651
    Rate: 12518
    Mateusz_konstruktor wrote:

    Colleague will you provide the website address where the above can be verified?

    The website address is the same all the time, as in the first paragraph:
    p.kaczmarek2 wrote:
    into English by sending it as an attachment with the prompt to the AI. I will use the free LMArena website to run the AI models:
    https://lmarena.ai/


    Mateusz_konstruktor wrote:

    I encounter such a problem in cases of incompatible settings or incompatible web browsers themselves.
    Parts of web pages sometimes get cut off as a result, and this happens especially with unusual settings.

    Somewhere in the fifteen years I've been doing the frontend and backend, and I haven't encountered the browser truncating the downloaded (source) image, how could that work? The browser does not do a discrete crop operation on the resource for which it sends a GET request. Clipped images can only be on the web page itself (e.g. img tags), but then "open target element" will still show the unclipped image, unless it's the backend that's already sending it clipped.

    To clarify - I'm talking about the Response field from the GET request:
    Vision-based AI models for translating catalogue notes - we test Nano Banana, ChatGPT-Image and othe

    The problem you write about could exist if I did a "print screen" instead of saving the image, but.... after all, that would even be more clicking than a simple "Save as".
    Helpful post? Buy me a coffee.
  • #19 21800409
    Mateusz_konstruktor
    Level 37  
    Posts: 4179
    Help: 267
    Rate: 1102
    p.kaczmarek2 wrote:
    I've been doing frontend and backend for fifteen years and I haven't encountered a browser clipping a downloaded (source) image, how could that work?

    This could be linked to the frame size of the item depending on the screen size and then automatically adjusting the dynamically generated image. Sometimes a slightly unusual web browser or some setting is sufficient. I have encountered such a phenomenon a few times on the websites of electronic component manufacturers from completely exotic regions for us.

    And isn't there, in this "free" variant, a limit on the size of the images?
    Maybe exceeding the limit causes this clipping?
    How about trying it with the same image, but saved at a lower resolution?
  • #20 21800480
    willyvmm
    Level 31  
    Posts: 1755
    Help: 164
    Rate: 357
    If the attached images are exactly what AI got, then I'm not surprised.
    Shit in => Shit out.
📢 Listen (AI):

FAQ

TL;DR: In 3/3 trials, Nano Banana Pro led; “only the latest Nano Banana Pro seems to give acceptable results,” reports p.kaczmarek2. [Elektroda, p.kaczmarek2, post #21795582] Why it matters: Engineers can quickly read non‑English datasheet screenshots without manual OCR, saving evaluation time.

Quick Facts

What’s the best vision model right now for translating datasheet screenshots?

Nano Banana Pro (gemini‑3‑pro‑image‑preview) ranked first across three trials. It delivered mostly correct text with minor errors. The author concludes it’s the only model with acceptable results today. “Only the latest Nano Banana Pro seems to give acceptable results.” [Elektroda, p.kaczmarek2, post #21795582]

Are GPT‑Image models good enough for production use?

GPT‑Image 1.5 and ChatGPT‑Image (20251216) were almost acceptable. They handled titles and many labels but degraded in longer paragraphs. Expect typos and occasional diagram damage. Use them for quick triage, not final documentation. [Elektroda, p.kaczmarek2, post #21795582]

Why do some results look like nonsense letters or broken pages?

Several models, especially Flux variants, produced meaningless letter strings. Others hallucinated new backgrounds, overlaid prompts, or cropped images. These are typical vision‑OCR failure modes when text is dense or stylized. [Elektroda, p.kaczmarek2, post #21795582]

Can I rely on AI to extract key specs from non‑English datasheets?

Yes, for quick reading of essentials. The thread shows usable summaries from top models for LED drivers, displays, and rectifiers. You still must verify numbers and units before design decisions. [Elektroda, p.kaczmarek2, post #21795582]

How do I translate a catalogue screenshot step‑by‑step?

  1. Capture a full‑page screenshot of the datasheet section.
  2. Upload it to LMArena (or similar) and prompt: “Translate to English; keep layout.”
  3. Review output and manually correct units and diagrams. [Elektroda, p.kaczmarek2, post #21795582]

Why don’t some Chinese manufacturers ship English datasheets?

One user notes many are readable via AI, yet wonders about missing bilingual PDFs. Possible reasons include domestic focus and resource limits. Use AI as a stopgap to access needed info. [Elektroda, fachman1964, post #21796137]

Will translation quality improve soon?

Community expectation is strong. As one poster said, progress is daily and visible improvements should arrive by year’s end. “There will be a visible difference.” [Elektroda, szeryf3, post #21796167]

Why do two runs of the same model produce different results?

Generations include randomness (seed). The author saw runs vary, with models sometimes better or worse. Repeat runs can help, but validate each output. [Elektroda, p.kaczmarek2, post #21795582]

Which failures should I watch for before trusting a translation?

Edge cases include overlaid prompt text, cropped diagrams, short‑circuited graphics between figures, and typos like “Synchornous.” Always compare to the original image. [Elektroda, p.kaczmarek2, post #21795582]

Is LMArena required, or can I use other platforms?

The tests used LMArena to run many models. Any platform that accepts image prompts and returns edited images or overlays can work similarly. [Elektroda, p.kaczmarek2, post #21795582]

What models should I avoid for screenshot translation today?

Avoid Flux‑2‑Max/Pro variants for this task; results were meaningless letters. Also avoid qwen‑image‑edit and seedream‑4‑high‑res‑fal due to image damage. [Elektroda, p.kaczmarek2, post #21795582]

Do multiple models agree on the same page?

Not always. One user showed ChatGPT 5.2 producing different stylistic outputs, confirming variability between and within models. Cross‑check critical lines. [Elektroda, MikeC, post #21796237]

What’s a realistic success rate based on this thread?

One model ranked as “acceptable,” two as “almost acceptable,” and several failed. That’s roughly 30% usable among highlighted models in the ranking. Treat as guidance, not a benchmark. [Elektroda, p.kaczmarek2, post #21795582]

Where does AI help most with catalogue notes?

Great for quick headline translation, pin/function labels, and block‑diagram captions. Less reliable for long prose and dense tables. Use human review for specs. [Elektroda, p.kaczmarek2, post #21795582]

Any expert takeaway I can act on today?

Start with Nano Banana Pro for screenshots. If it struggles, try GPT‑Image 1.5, then ChatGPT‑Image (20251216). Always verify critical numbers. [Elektroda, p.kaczmarek2, post #21795582]

What prompt helps preserve layout?

Use concise directives: “Translate to English; keep layout; don’t add backgrounds; keep diagrams intact.” The author warns that some models rearrange pages, so constrain edits. [Elektroda, p.kaczmarek2, post #21795582]
Generated by the language model.
ADVERTISEMENT