3.6 KiB
For working with PDF files in Python, the PyPDF2 library is a commonly used tool. It allows you to split, merge, rotate, and encrypt PDFs, among other functionalities. However, it's worth noting that PyPDF2 primarily deals with manipulating existing PDFs rather than creating new ones from scratch. For more advanced PDF creation and manipulation, libraries such as ReportLab might be more suitable. Below is a concise reference guide for common use cases with PyPDF2, formatted in Markdown syntax:
PyPDF2 Reference Guide
Installation
pip install PyPDF2
Reading PDFs
from PyPDF2 import PdfReader
# Open a PDF file
reader = PdfReader('path_to_file.pdf')
# Get the number of pages
num_pages = len(reader.pages)
# Read a specific page's text
page = reader.pages[0]
text = page.extract_text()
print(text)
Merging PDFs
from PyPDF2 import PdfReader, PdfWriter
# Create a PDF writer object
writer = PdfWriter()
# Open the first PDF
reader1 = PdfReader('path_to_first_file.pdf')
# Add all its pages to the writer
for page in reader1.pages:
writer.add_page(page)
# Open the second PDF
reader2 = PdfReader('path_to_second_file.pdf')
# Add all its pages to the writer
for page in reader2.pages:
writer.add_page(page)
# Write out the merged PDF
with open('merged_file.pdf', 'wb') as out:
writer.write(out)
Splitting PDFs
from PyPDF2 import PdfReader, PdfWriter
# Function to split a PDF into one PDF per page
def split_pdf(path):
reader = PdfReader(path)
for page_number in range(len(reader.pages)):
writer = PdfWriter()
writer.add_page(reader.pages[page_number])
output_filename = f'page_{page_number+1}.pdf'
with open(output_filename, 'wb') as output_pdf:
writer.write(output_pdf)
split_pdf('path_to_file.pdf')
Rotating Pages
from PyPDF2 import PdfReader, PdfWriter
# Open the PDF
reader = PdfReader('path_to_file.pdf')
writer = PdfWriter()
# Rotate the first page by 90 degrees
page = reader.pages[0].rotate(90) # Rotate clockwise
writer.add_page(page)
# Add the rest of the pages without rotation
for page in reader.pages[1:]:
writer.add_page(page)
# Save the rotated PDF
with open('rotated_file.pdf', 'wb') as out:
writer.write(out)
Encrypting PDFs
from PyPDF2 import PdfReader, PdfWriter
# Open the PDF
reader = PdfReader('path_to_file.pdf')
writer = PdfWriter()
# Add all pages to the writer
for page in reader.pages:
writer.add_page(page)
# Encrypt the PDF
writer.encrypt('password')
# Save the encrypted PDF
with open('encrypted_file.pdf', 'wb') as out:
writer.write(out)
Extracting Information from PDFs
from PyPDF2 import PdfReader
reader = PdfReader('path_to_file.pdf')
# Metadata
metadata = reader.metadata
print(metadata)
# Number of pages
print(len(reader.pages))
# Extract text from all pages
for page in reader.pages:
text = page.extract_text()
print(text)
This guide covers the basics of PyPDF2 for PDF manipulation, including reading, merging, splitting, rotating, and encrypting PDF files. While PyPDF2 is useful for these operations, it may have limitations with complex PDFs or specific PDF creation needs, for which other libraries like ReportLab might be more appropriate.
Keep in mind that PyPDF2 focuses more on manipulating existing PDF files and might not support all features for complex PDF manipulation or creation from scratch. For more advanced PDF processing or creation, exploring other libraries or combining multiple libraries might be necessary to achieve desired outcomes.