Adding OCR Languages in VueScan

VueScan has built-in Optical Character Recognition (OCR) for English, Spanish, German, French and Italian in VueScan 9.8.35 and later.

VueScan uses Google’s Tesseract 3 for VueScan 9.8.34 and earlier, and Tesseract 5 for VueScan 9.8.35 and later.

There are 44 additional languages you can use by downloading one of the ocr_xx.bin files (for VueScan 9.8.34 and earlier) or xxx.traineddata files (for VueScan 9.8.35 and later) below.

These files contain data about the character set used in each of these languages, and the OCR results will be better if you use them.

To add support for additional languages in the “Output | OCR text language” option, you need to download a language-specific file. Store this file on your hard drive in one of the following locations:

Operating System	Download Location
macOS	/Users/Shared
Windows (VueScan 9.1 and earlier)	c:\vuescan
Windows (VueScan 9.2 and later)	same location as vuescan.log or c:\Program Files\VueScan
Linux	same location as vuescan.log or with vuescan executable program

Supported OCR Languages

Click on one of the links below and save the file in the location described above. You can find the additional languages and more accurate (albeit slower) trained data at https://github.com/tesseract-ocr. Note that you need to use one of the three letter language codes built into VueScan.

9.8.34 and earlier	9.8.35 and later	Language
ocr_bg.bin	bul.traineddata	Bulgarian
ocr_ca.bin	cat.traineddata	Catalan
ocr_zh.bin	zho.traineddata	Chinese (Simplified)
ocr_tw.bin	zht.traineddata	Chinese (Traditional)
ocr_cs.bin	ces.traineddata	Czech
ocr_da.bin	dan.traineddata	Danish
ocr_nl.bin	nld.traineddata	Dutch
ocr_en.bin (built-in)	eng.traineddata (built-in)	English
ocr_fi.bin	fin.traineddata	Finnish
ocr_fr.bin	fra.traineddata (built-in)	French
ocr_de.bin	deu.traineddata (built-in)	German
ocr_el.bin	ell.traineddata	Greek
ocr_hu.bin	hun.traineddata	Hungarian
ocr_id.bin	ind.traineddata	Indonesian
ocr_it.bin	ita.traineddata (built-in)	italian
ocr_ja.bin	jpn.traineddata	Japanese
ocr_ko.bin	kor.traineddata	Korean
ocr_lv.bin	lav.traineddata	Latvian
ocr_lt.bin	lit.traineddata	Lithuanian
ocr_no.bin	nor.traineddata	Norwegian
ocr_pl.bin	pol.traineddata	Polish
ocr_pt.bin	por.traineddata	Portuguese
ocr_ro.bin	ron.traineddata	Romanian
ocr_ru.bin	rus.traineddata	Russian
ocr_sr.bin	srp.traineddata	Serbian
ocr_sk.bin	slk.traineddata	Slovak
ocr_sl.bin	slv.traineddata	Slovenian
ocr_es.bin	spa.traineddata (built-in)	Spanish
ocr_sv.bin	swe.traineddata	Swedish
ocr_th.bin	tha.traineddata	Thai
ocr_tl.bin	fil.traineddata	Tagalog
ocr_tr.bin	tur.traineddata	Turkish
ocr_uk.bin	ukr.traineddata	Ukrainian
ocr_vi.bin	vie.traineddata	Vietnamese
	ara.traineddata	Arabic
	ben.traineddata	Bengali
	fas.traineddata	Persian
	guj.traineddata	Gujarati
	heb.traineddata	Hebrew
	hin.traineddata	Hindi
	mar.traineddata	Marathi
	tam.traineddata	Tamil
	tel.traineddata	Telugu
	urd.traineddata	Urdu