OCR Processes

Prior to this release, PrimeOCR was the only supported OCR program, which is how the PIP (PrimeOCR Interface Program) acronym originated. Because this release adds support for Scansoft OCR, the maintenance records have been renamed to OCR Processes.

 

To manage the OCR Processes in Admin, select OCR Processes from the Maintain menu. Records in the list can be created, changed, and deleted using the standard list buttons described in the FaxAction Admin Manual. The mechanism for maintaining the OCR processes was only accessible through the Manage PIP button on the fax and image Input Server records in previous versions.

 

Additions to and deletions from this list only take effect when FaxAction Server Monitor is stopped and restarted or is asked to re-read its configuration through FaxAction Server Admin.

 

Changes to the Program Name, Disabled, and Server fields in the form require FaxAction Server Monitor to be reconfigured. Changes to all other fields will not take effect until the appropriate PIP process is stopped and restarted or asked to re-read its configuration.

 

Record Details

 

image\admin_ocr_processes.gif

 

Program should contain the fully qualified name of the PrimeOCR Interface Program (PIP) program file on the server. This field is required to contain data, but the path is not validated due to the fact that Admin could be running on a machine other than the FaxAction server.

 

When pressed, the Find button will display a standard Windows Find File Dialog to locate the application file. This button should only be used when Admin is running on the server in order to ensure that the correct path is entered.

 

If the Disabled field is checked, the process will not be started the next time that Sever Monitor is reconfigured. If the process is running when Server Monitor is reconfigured, it will automatically be stopped.

 

The Server To Run On field specifies which machine name the process is to run on. If nothing is entered in this field, the specified server process will be run on the same machine as the database. If clustering software is used, a specific machine name should not be entered unless the process is to run on a server that is not part of the database cluster. This requirement is due to the fact that the machine name the database runs on will likely change when a failover occurs. This field is not validated to ensure that the specified server exists. The machine name can either be entered in UNC notation (i.e. \\FAXACTION) or in "plain English" notation (i.e. FAXACTION).

 

If the new Log Comm? field is checked, communication by the OCR process with the OCR program will be logged to the log file directory configured in the General Info record. The purpose of this functionality is to aid in determining the point at which OCRing fails on a particular image. The file name will be COMM_OCR_1.LOG where 1 is replaced with the appropriate OCR process number. Note that the logging capability for each OCR program varies greatly.

The type of OCR program that is licensed by the site and used for the current OCR process is specified in the Type dropdown list. The available values are PrimeOCR and Scansoft. Selecting an OCR program that is not present on the FaxAction server will render the OCR process unusable when it is started.

 

The OCR Processing Directory contains the fully qualified name of the directory on the FaxAction server that is used by the OCR process to store files that it is working with. This field is required to contain data, but the path is not validated due to the fact that Admin could be running on a machine other than the FaxAction server.

 

When pressed, the Find button will display a standard Windows Find Directory Dialog to locate the OCR processing directory. This button should only be used when Admin is running on the server in order to ensure that the correct path is entered.

 

Prime OCR Settings

 

This frame is automatically displayed when the PrimeOCR Type is selected. Most of the fields in this form should be set to their default values since they are features that must be licensed from Prime Recognition.

 

The Level list specifies the number of OCR engines that have been licensed by the site. The default is LEVEL_3.

 

Language specifies the most likely language that the documents contain. The default is US

 

The file format that OCRed text is produced in is specified in the Output list. Formatted ASCII is the default.

 

The CPUs field contains the number of CPUs that PrimeOCR has been licensed to run on. The default is 1.

 

If Stop OCR Early if Low/High Quality is checked, PrimeOCR will automatically stop OCRing an image when any of the engines reports that its confidence in the OCR results is below 60% or above 90%. The effective result of this setting is that very low quality images are not OCRed, freeing CPU time for other images or processing. This option is checked by default.

Do Lexical Checking instructs PrimeOCR to perform lexical analysis, including spell checking, on the OCR results in order to improve the resulting text. This option’s default is checked.

 

When ScanFix is checked, PrimeOCR preprocesses the image to prepare it for OCRing. This is a purchasable option. The default is unchecked.

 

The Deskew option is another purchasable option used by PrimeOCR to automatically deskew received images prior to OCRing them. The default is unchecked.

 

Scansoft Settings

 

This frame is automatically displayed when the Scansoft Type is selected.

 

image\admin_ocr_processes_scansoft.gif

 

The Output list contains the file formats that the OCRed text can be produced in. If the site is licensed to use the Scansoft client application within the FaxAction Windows client, the IPRO format should be selected, even if FaxAction Macintosh clients are in use. If this is not the case, it is recommended that the Rich Text Format for Windows format be selected.

 

The quantity of text formatting (spacing, fonts, font faces, etc.) that is retained is selected using the Format list. The type of information that is preserved is dependent on the capabilities of the selected Output format. The available values are:

 

·     Full Formatting Retained causes all formatting to be retained, including the text zoning, which is converted to frames in the output document. This setting should be used for the IPRO output format.

 

·     Partial Formatting Retained adds character and paragraph formatting information to the output.

 

·     Minimal Formatting Retained preserves only the recognized characters.