GetText

The GetText component enables users to extract text from the pdf file.

ScreenShot

Supported Functions

Double Click on the GetText component title bar to open the EXTRACTOR SETTINGS Window.

ScreenShot

  1. Click the Checkbox to use OCR for Extraction.
    a. User can select an OCR engine form the list.
    b. User have to provide the credentials if a commercial OCR engine is being used from the list.
    (ex: Access key, Secret Key, Api Key, Application Id, Cloud Password, License Key etc.)

Ports

The GetText component exposes the Control In, Control Out, Data In and Data Out ports by default.

Port Description
ControlIn Must be connected to the Control Out port of one or more components.
ControlOut Can be connected to the Control In port of another component or the default end component.
Data In The GetText component exposes the following Data In ports by default:
Filepath: specifies the location of the PDF file.
PageNumber: specifies the PageNumber of the PDF file.
Data Out Returns the content of the pdf document.

Properties

To edit the properties of the GetText component, in the Properties window, change the required property. You can edit the following properties.

Property Description
Search Search for the respective property.
Delay After Execution Specifies the wait time (in seconds) after the action is performed.
Delay Before Execution Specifies the wait time (in seconds) before the action is performed.

Example

Let us consider an example.

ScreenShot

To extract the Data from PDF file:

  1. In the Toolbox, expand Utilities and then expand PDF.

  2. Drag the GetText component and drop it on the Design surface.

  3. Double-click the Filepath box and enter the required path.

  4. User can specify the Page number of the Pdf file for page specific extraction.

  5. To override the existing data source, right-click Pdf FilePath.

  6. Click override and change the data source.

    Note

    To learn more about overriding the data source of the data port, refer the Override section.

  7. Double-click on GetText title bar then “EXTRACTOR SETTINGS” window will open.

    ScreenShot

  8. Click the checkbox and select an OCR engine from the list.
    If the user wants to use a commercial OCR engine, then required credentials need to be provided.
    (In this example we have selected “Windows” as the OCR Engine Type.)

  9. Click on Ok.

  10. Drag the MessageBox show component and drop it on the design surface.

  11. Connect the control ports and the data ports in the activity.

  12. In the toolbar, click Run.

    ScreenShot