Class PdfTextExtractor
java.lang.Object
com.itextpdf.text.pdf.parser.PdfTextExtractor
-
Constructor Summary
ConstructorsModifierConstructorDescriptionprivateThis class only contains static methods. -
Method Summary
Modifier and TypeMethodDescriptionstatic StringgetTextFromPage(PdfReader reader, int pageNumber) Extract text from a specified page using the default strategy.static StringgetTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy) Extract text from a specified page using an extraction strategy.static StringgetTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators) Extract text from a specified page using an extraction strategy.
-
Constructor Details
-
PdfTextExtractor
private PdfTextExtractor()This class only contains static methods.
-
-
Method Details
-
getTextFromPage
public static String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy, Map<String, ContentOperator> additionalContentOperators) throws IOException Extract text from a specified page using an extraction strategy. Also allows registration of custom ContentOperators- Parameters:
reader- the reader to extract text frompageNumber- the page to extract text fromstrategy- the strategy to use for extracting textadditionalContentOperators- an optional map of custom ContentOperators for rendering instructions- Returns:
- the extracted text
- Throws:
IOException- if any operation fails while reading from the provided PdfReader
-
getTextFromPage
public static String getTextFromPage(PdfReader reader, int pageNumber, TextExtractionStrategy strategy) throws IOException Extract text from a specified page using an extraction strategy.- Parameters:
reader- the reader to extract text frompageNumber- the page to extract text fromstrategy- the strategy to use for extracting text- Returns:
- the extracted text
- Throws:
IOException- if any operation fails while reading from the provided PdfReader- Since:
- 5.0.2
-
getTextFromPage
Extract text from a specified page using the default strategy.Note: the default strategy is subject to change. If using a specific strategy is important, use
getTextFromPage(PdfReader, int, TextExtractionStrategy)- Parameters:
reader- the reader to extract text frompageNumber- the page to extract text from- Returns:
- the extracted text
- Throws:
IOException- if any operation fails while reading from the provided PdfReader- Since:
- 5.0.2
-