In this video I have shown how you can extract text from PDF using java.
All the dependencies and project source code is below.
1. Project Structure in Eclipse IDE.
2. All the dependencies jar is shown below. You have to download all and put in class path.
You can see the video how you can put it in class path.
Download all the jars here,pdfbox jars
3.OCR_PDF_Test.java
Below code you can use for Reading PDF,
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
package jinujawad.com; import java.io.File; import org.apache.pdfbox.pdmodel.PDDocument; import org.apache.pdfbox.text.PDFTextStripper; public class OCR_PDF_Test { public static void main(String[] args) { try { File pdf_file=new File("C:/Users/MIRITPC/Desktop/pdfbox/Read.pdf"); PDDocument document=PDDocument.load(pdf_file); PDFTextStripper pdfstripper= new PDFTextStripper(); String ocr_text=pdfstripper.getText(document); System.out.println(ocr_text); } catch (Exception e) { System.out.println(e); } } } |
4 . The console after reading the pdf content is shown below.