Foxit PDF Editor can detect whether a PDF file is scanned or image-based and make corresponding suggestions to initiate OCR when opening a scanned or image-based PDF. You can also run OCR anytime to recognize the image-based text in a PDF.
To recognize image-based or scanned text in a PDF file, perform the following steps:
- Click Convert > Recognize Text > Current File.
- In the Recognize Text dialog box, specify the page range you need.
- Choose the language used in your document. You can select multiple languages as well.
- In the output type, select Searchable Image Text or Searchable Image Text (original image) to make the image text selectable and searchable, or select Editable Text to enable the image text to be edited with Foxit PDF Editor. With Searchable Image Text or Editable Text selected, you can set a DPI value for the output in the Downsample To item to compress images in the document and reduce the file size during the OCR process.
- Searchable Image Text/Searchable Image Text (original image): During the OCR process, Foxit PDF Editor analyzes the image text and substitutes words/characters that closely approximates the image text. The substitute words/characters will be placed on an invisible layer of text in the PDF, which makes the image text selectable and searchable. If the substitution is uncertain, the text will be marked as OCR suspects which need to be corrected manually.
- Editable Text: During the OCR process, Foxit PDF Editor compares the shape of the image text to the approximate fonts installed on your system, and turns the image text into editable text.
Note: If you are prompted to download the OCR component after clicking OK, please click Yes to download and install it, or download it later from the link provided and install it by clicking Install Plugin in the About Plug-in Management dialog box which pops up when you click Plug-in Management in the Help tab. (Tip: For a plug-in in MSI format, double-click it to install it.)
(Optional) If you check Find All Suspect (Show all OCR results that may need to be changed.), the OCR Suspects dialog box pops up for you to check and correct OCR suspects right after the recognition completes. To learn how to correct OCR suspects, please refer to Find and Correct OCR Suspects.
If you choose Editable Text in the output type, with the Find All Suspect (Show all OCR results that may need to be changed.) option selected, the OCRed text that Foxit PDF Editor is not certain about will be marked as OCR suspects, and the original image text will be kept until you manually handle all the OCR suspects. You can also deselect this option to turn the image text into editable text with no OCR suspects after recognition. And you can modify the text directly using the commands in the Edit tab if needed (e.g., some text was not correctly recognized).
- (Optional) If you select Editable Text in Step 4, the Recognize the line segments as path objects in the PDF option is available. If the image text in your document contains tables, selecting this option helps better recognize the line segments, but it may take longer to complete recognition.
- Click OK. A recognition text process bar will pop up to show the progress.
- Do the search function, the text on your image or scanned document will be searchable or editable.
Tip: Foxit PDF Editor provides the Quick Recognition command under Home/Convert tab to recognize all pages of a scanned or image-based PDF with default settings (or the settings you specified in the Recognize Text dialog box last time when you use the Recognize Text command) by one-click.
To recognize text in multiple files:
- Click Convert > Recognize Text > Multiple Files.
- In the Recognize Text dialog box, click Add Files to add files, folders, or currently opened files. Use Move up, Move down, and Remove to adjust the order of the files.
- Click Output Options…. In the Output Options dialog box, select the destination folder, choose how to name the new file and whether to overwrite an existing one, and then click OK.
- Click OK.
Notes:
- When you are using the CJK OCR engine for the first time, the system will remind you to download and install the engine from the Foxit server.
- If there is any unsupported file added, a “Remove unsupported file(s)” button will appear in the Recognize Text dialog box. Click the button to remove the unsupported file(s) and then continue. While recognizing a PDF portfolio, Foxit PDF Editor will only extract and recognize PDF files in the portfolio.
To recognize a selected area on a PDF page (Available in Pro Only):
- Click Convert > Recognize Text > Selected Region.
- The cursor changes into a Cross
automatically. - Click and drag a rectangle around the area you want to recognize.
- Right-click the selected area and choose Recognize Selected Region.
- In the pop-up dialog box, choose the language(s) used in your document, and click OK to start recognition.