Files
the_information_nexus/tech_docs/python/PyPDF2.md
2024-05-01 12:28:44 -06:00

131 lines
3.6 KiB
Markdown

For working with PDF files in Python, the `PyPDF2` library is a commonly used tool. It allows you to split, merge, rotate, and encrypt PDFs, among other functionalities. However, it's worth noting that `PyPDF2` primarily deals with manipulating existing PDFs rather than creating new ones from scratch. For more advanced PDF creation and manipulation, libraries such as `ReportLab` might be more suitable. Below is a concise reference guide for common use cases with `PyPDF2`, formatted in Markdown syntax:
# `PyPDF2` Reference Guide
## Installation
```
pip install PyPDF2
```
## Reading PDFs
```python
from PyPDF2 import PdfReader
# Open a PDF file
reader = PdfReader('path_to_file.pdf')
# Get the number of pages
num_pages = len(reader.pages)
# Read a specific page's text
page = reader.pages[0]
text = page.extract_text()
print(text)
```
## Merging PDFs
```python
from PyPDF2 import PdfReader, PdfWriter
# Create a PDF writer object
writer = PdfWriter()
# Open the first PDF
reader1 = PdfReader('path_to_first_file.pdf')
# Add all its pages to the writer
for page in reader1.pages:
writer.add_page(page)
# Open the second PDF
reader2 = PdfReader('path_to_second_file.pdf')
# Add all its pages to the writer
for page in reader2.pages:
writer.add_page(page)
# Write out the merged PDF
with open('merged_file.pdf', 'wb') as out:
writer.write(out)
```
## Splitting PDFs
```python
from PyPDF2 import PdfReader, PdfWriter
# Function to split a PDF into one PDF per page
def split_pdf(path):
reader = PdfReader(path)
for page_number in range(len(reader.pages)):
writer = PdfWriter()
writer.add_page(reader.pages[page_number])
output_filename = f'page_{page_number+1}.pdf'
with open(output_filename, 'wb') as output_pdf:
writer.write(output_pdf)
split_pdf('path_to_file.pdf')
```
## Rotating Pages
```python
from PyPDF2 import PdfReader, PdfWriter
# Open the PDF
reader = PdfReader('path_to_file.pdf')
writer = PdfWriter()
# Rotate the first page by 90 degrees
page = reader.pages[0].rotate(90) # Rotate clockwise
writer.add_page(page)
# Add the rest of the pages without rotation
for page in reader.pages[1:]:
writer.add_page(page)
# Save the rotated PDF
with open('rotated_file.pdf', 'wb') as out:
writer.write(out)
```
## Encrypting PDFs
```python
from PyPDF2 import PdfReader, PdfWriter
# Open the PDF
reader = PdfReader('path_to_file.pdf')
writer = PdfWriter()
# Add all pages to the writer
for page in reader.pages:
writer.add_page(page)
# Encrypt the PDF
writer.encrypt('password')
# Save the encrypted PDF
with open('encrypted_file.pdf', 'wb') as out:
writer.write(out)
```
## Extracting Information from PDFs
```python
from PyPDF2 import PdfReader
reader = PdfReader('path_to_file.pdf')
# Metadata
metadata = reader.metadata
print(metadata)
# Number of pages
print(len(reader.pages))
# Extract text from all pages
for page in reader.pages:
text = page.extract_text()
print(text)
```
This guide covers the basics of `PyPDF2` for PDF manipulation, including reading, merging, splitting, rotating, and encrypting PDF files. While `PyPDF2` is useful for these operations, it may have limitations with complex PDFs or specific PDF creation needs, for which other libraries like `ReportLab` might be more appropriate.
Keep in mind that `PyPDF2` focuses more on manipulating existing PDF files and might not support all features for complex PDF manipulation or creation from scratch. For more advanced PDF processing or creation, exploring other libraries or combining multiple libraries might be necessary to achieve desired outcomes.