Adding OCR Languages in VueScan

Ed Hamrick

VueScan has built-in Optical Character Recognition (OCR) for English, Spanish, German, French and Italian in VueScan 9.8.35 and later.

VueScan uses Google’s Tesseract 3 for VueScan 9.8.34 and earlier, and Tesseract 5 for VueScan 9.8.35 and later.

There are 44 additional languages you can use by downloading one of the ocr_xx.bin files (for VueScan 9.8.34 and earlier) or xxx.traineddata files (for VueScan 9.8.35 and later) below.

These files contain data about the character set used in each of these languages, and the OCR results will be better if you use them.

To add support for additional languages in the “Output | OCR text language” option, you need to download a language-specific file. Store this file on your hard drive in one of the following locations:

Operating System Download Location
macOS /Users/Shared
Windows (VueScan 9.1 and earlier) c:\vuescan
Windows (VueScan 9.2 and later) same location as vuescan.log or c:\Program Files\VueScan
Linux same location as vuescan.log or with vuescan executable program

Supported OCR Languages

Click on one of the links below and save the file in the location described above. You can find the additional languages and more accurate (albeit slower) trained data at https://github.com/tesseract-ocr. Note that you need to use one of the three letter language codes built into VueScan.

9.8.34 and earlier 9.8.35 and later Language
ocr_bg.bin bul.traineddata Bulgarian
ocr_ca.bin cat.traineddata Catalan
ocr_zh.bin zho.traineddata Chinese (Simplified)
ocr_tw.bin zht.traineddata Chinese (Traditional)
ocr_cs.bin ces.traineddata Czech
ocr_da.bin dan.traineddata Danish
ocr_nl.bin nld.traineddata Dutch
ocr_en.bin (built-in) eng.traineddata (built-in) English
ocr_fi.bin fin.traineddata Finnish
ocr_fr.bin fra.traineddata (built-in) French
ocr_de.bin deu.traineddata (built-in) German
ocr_el.bin ell.traineddata Greek
ocr_hu.bin hun.traineddata Hungarian
ocr_id.bin ind.traineddata Indonesian
ocr_it.bin ita.traineddata (built-in) italian
ocr_ja.bin jpn.traineddata Japanese
ocr_ko.bin kor.traineddata Korean
ocr_lv.bin lav.traineddata Latvian
ocr_lt.bin lit.traineddata Lithuanian
ocr_no.bin nor.traineddata Norwegian
ocr_pl.bin pol.traineddata Polish
ocr_pt.bin por.traineddata Portuguese
ocr_ro.bin ron.traineddata Romanian
ocr_ru.bin rus.traineddata Russian
ocr_sr.bin srp.traineddata Serbian
ocr_sk.bin slk.traineddata Slovak
ocr_sl.bin slv.traineddata Slovenian
ocr_es.bin spa.traineddata (built-in) Spanish
ocr_sv.bin swe.traineddata Swedish
ocr_th.bin tha.traineddata Thai
ocr_tl.bin fil.traineddata Tagalog
ocr_tr.bin tur.traineddata Turkish
ocr_uk.bin ukr.traineddata Ukrainian
ocr_vi.bin vie.traineddata Vietnamese
ara.traineddata Arabic
ben.traineddata Bengali
fas.traineddata Persian
guj.traineddata Gujarati
heb.traineddata Hebrew
hin.traineddata Hindi
mar.traineddata Marathi
tam.traineddata Tamil
tel.traineddata Telugu
urd.traineddata Urdu