PDFs
In this section we are going to tackle a few libraries and tools that can be used to work with PDF files.
PyPDF
Perform common tasks like merging, splitting and adding watermarks to PDF documents.
PDFMiner.six
Get detailed access to the internal structure of PDF documents, such as extracting text with precise positioning, extracting images, or navigating complex document structures.
fpdf2
The library is specifically designed for creating PDF files from scratch.
Usage
PyPDF
from pypdf import PdfReader
reader = PdfReader("example.pdf")
print(len(reader.pages))
PDFMiner.six
from pdfminer.high_level import extract_text
print(extract_text('samples/simple1.pdf'))
fpdf2
from fpdf import FPDF
pdf = FPDF()
pdf.add_page()
pdf.set_font("Arial", size=25)
# create a cell
pdf.cell(200, 10, txt="Hello World!", ln=1, align='C')
pdf.output("info.pdf")
AI/LLM's are quite good with
pdfs
.
👉 Try asking ReMark