Installing Tesseract for Squish

Squish uses the free Tesseract OCR library as its primary engine for faciliating text recognition. To use the Tesseract OCR engine, the package, including all of the language files, needs to be installed independently of Squish. Any other OCR engine can potentially be substituted for use with Squish.

Tesseract for Squish is supplied as a single, easy-to-install binary package that contains the engine libraries and the full set of language files. The packages for all supported platforms can be found in the Downloads section of Qt Customer Portal after choosing Squish as a product (you may want to filter the list of packages by edition and select Special as the edition of choice in order to locate the Tesseract packages easily).

OCR Functionality in Squish

Optical Character Recognition (OCR) is a technology that enables the digitization of scanned images with printed or handwritten text into machine-readable data that can later be used for electronic editing. Image sources fed to OCR software include image-only PDFs, scanned documents, handwritten manuscripts or camera images, among others.

Applications of OCR technology are wide and varied and include automatic data entry, passport recognition in airports, digitizing dated newspapers, automatic number-plate recognition and assistive technology for the visually impaired. Advantages of using OCR to digitize text are clear. That is, OCR offers a massive saving of storage space by compacting paper documents into electronic documents; searchability is vastly improved for printed texts; revising a document is then easier once a text has been computerised into machine-encoded text and can be done with a standard word processor; and digital backups of printed text (e.g., legal paperwork or newspapers) can be done frequently and with greater security over keeping documents in printed form.

Squish includes OCR as a compliment to its already powerful Object-based and Image-based recognition methods. Variability in a component's visual appearance is particularly prominent for onscreen text when trying to create platform-independent tests, due to a wide assortment of fonts, font sizes, decorations and rendering modes. Thus, Image-based recognition methods, including Fuzzy Image Search, are generally unsuitable for locating text onscreen. OCR therefore allows for efficient text handling in those scenarios where the same text is rendered with different parameters, making it look largely dissimilar in pixel-to-pixel comparison (i.e., due to varying letter widths, different kerning or shifting line break positions).

Configuring the Package

Download the Tesseract for Squish package for your operating system from Qt Customer Portal onto your computer, and execute it.

On Linux, you first need to make the .run file which you downloaded executable. Popular desktop environments allow this by right-clicking the file and enabling the Execute permission. You can also make the installer executable on the command line by issuing the following:

$ chmod a+x
tesseract-4.0.0-for-squish.x64.run

The installation program will guide you through the configuration process by presenting multiple pages.

"Tesseract for Squish setup program"

Note: Once you start the installer, you can go back to change a configuration setting using the Back button and proceed to the following pages using the Next button.

Installation Folder

This step decides the location on your system in which the Tesseract for Squish will be installed.

"Target selection page"

Acknowledging the Terms of Use

"Apache license test page"

After selecting the installation folder, you will be presented with the license under which you are permitted to use your copy of Tesseract for Squish. Read the entire license text carefully. Click one of the two radio buttons (I accept the license. or I do not accept the license.) that appear below the license text, to indicate whether you agree or disagree with the terms. If you disagree, you cannot install or use Tesseract for Squish. To terminate the installation, click the Cancel.

If you accept the license, the Next button will become enabled, and you can proceed to the next step of the configuration process.

Tesseract Engine Registration

In order to use Tesseract with Squish, its installation path needs to be registered with Squish. The Tesseract for Squish package installer will perform the registration during the installation if you select Register the Tesseract installation with Squish.

"Qt library configuration page"

Note: If you choose not to register the Tesseract installation with Squish, you can do it later by entering the chosen installation path on the Squish IDE OCR pane or by editing the ocr.ini file manually.

Ready to Install

At this point all the configuration options have been set and the installation is ready to launch. A page is shown which displays the disk space required by the Tesseract for Squish installation.

"Configuration review page"

Executing the Installation

The installation program now commences installing Tesseract for Squish on your system. You can click the Show Details button to get a detailed list of actions performed as part of the installation.

"Installing a package"

You can close the installer at any time, e.g. by closing the window or by pressing the Cancel button (only visible on platforms other than macOS). All changes done so far will be rolled back.

Concluding the Configuration

Congratulations! You have finished installing the Tesseract for Squish. This page concludes the setup of your Tesseract for Squish binary package.

"Final page"

Click the Finish button to close the installation program.

Performing Unattended Installations

It is possible to perform the installation of Tesseract for Squish completely unattended, passing any required values up front. Unattended installation requires no user interactions whatsoever and is equivalent to manually interacting with the installer interface. To perform an unattended installation, invoke the Tesseract for Squish installation program from the command line passing at least the argument unattended=1:

$ ./tesseract-4.0.0-for-squish.x64.run unattended=1 more options...

That argument will launch the installation without any graphical user interface. Instead, progress information and potential error messages are written to the console.

In addition to the unattended=1 argument, you may want to specify targetdir=PATH argument to specify the target installation directory or the register=0 to disable the automatic registration of the engine with Squish

$ ./tesseract-4.0.0-for-squish.x64.run unattended=1 targetdir=/opt/tesseract register=0

© 2023 The Qt Company Ltd. Documentation contributions included herein are the copyrights of their respective owners.
The documentation provided herein is licensed under the terms of the GNU Free Documentation License version 1.3 as published by the Free Software Foundation.
Qt and respective logos are trademarks of The Qt Company Ltd. in Finland and/or other countries worldwide. All other trademarks are property of their respective owners.