131 lines
3.6 KiB
Markdown
131 lines
3.6 KiB
Markdown
For working with PDF files in Python, the `PyPDF2` library is a commonly used tool. It allows you to split, merge, rotate, and encrypt PDFs, among other functionalities. However, it's worth noting that `PyPDF2` primarily deals with manipulating existing PDFs rather than creating new ones from scratch. For more advanced PDF creation and manipulation, libraries such as `ReportLab` might be more suitable. Below is a concise reference guide for common use cases with `PyPDF2`, formatted in Markdown syntax:
|
|
|
|
# `PyPDF2` Reference Guide
|
|
|
|
## Installation
|
|
```
|
|
pip install PyPDF2
|
|
```
|
|
|
|
## Reading PDFs
|
|
```python
|
|
from PyPDF2 import PdfReader
|
|
|
|
# Open a PDF file
|
|
reader = PdfReader('path_to_file.pdf')
|
|
|
|
# Get the number of pages
|
|
num_pages = len(reader.pages)
|
|
|
|
# Read a specific page's text
|
|
page = reader.pages[0]
|
|
text = page.extract_text()
|
|
print(text)
|
|
```
|
|
|
|
## Merging PDFs
|
|
```python
|
|
from PyPDF2 import PdfReader, PdfWriter
|
|
|
|
# Create a PDF writer object
|
|
writer = PdfWriter()
|
|
|
|
# Open the first PDF
|
|
reader1 = PdfReader('path_to_first_file.pdf')
|
|
# Add all its pages to the writer
|
|
for page in reader1.pages:
|
|
writer.add_page(page)
|
|
|
|
# Open the second PDF
|
|
reader2 = PdfReader('path_to_second_file.pdf')
|
|
# Add all its pages to the writer
|
|
for page in reader2.pages:
|
|
writer.add_page(page)
|
|
|
|
# Write out the merged PDF
|
|
with open('merged_file.pdf', 'wb') as out:
|
|
writer.write(out)
|
|
```
|
|
|
|
## Splitting PDFs
|
|
```python
|
|
from PyPDF2 import PdfReader, PdfWriter
|
|
|
|
# Function to split a PDF into one PDF per page
|
|
def split_pdf(path):
|
|
reader = PdfReader(path)
|
|
for page_number in range(len(reader.pages)):
|
|
writer = PdfWriter()
|
|
writer.add_page(reader.pages[page_number])
|
|
output_filename = f'page_{page_number+1}.pdf'
|
|
|
|
with open(output_filename, 'wb') as output_pdf:
|
|
writer.write(output_pdf)
|
|
|
|
split_pdf('path_to_file.pdf')
|
|
```
|
|
|
|
## Rotating Pages
|
|
```python
|
|
from PyPDF2 import PdfReader, PdfWriter
|
|
|
|
# Open the PDF
|
|
reader = PdfReader('path_to_file.pdf')
|
|
writer = PdfWriter()
|
|
|
|
# Rotate the first page by 90 degrees
|
|
page = reader.pages[0].rotate(90) # Rotate clockwise
|
|
writer.add_page(page)
|
|
|
|
# Add the rest of the pages without rotation
|
|
for page in reader.pages[1:]:
|
|
writer.add_page(page)
|
|
|
|
# Save the rotated PDF
|
|
with open('rotated_file.pdf', 'wb') as out:
|
|
writer.write(out)
|
|
```
|
|
|
|
## Encrypting PDFs
|
|
```python
|
|
from PyPDF2 import PdfReader, PdfWriter
|
|
|
|
# Open the PDF
|
|
reader = PdfReader('path_to_file.pdf')
|
|
writer = PdfWriter()
|
|
|
|
# Add all pages to the writer
|
|
for page in reader.pages:
|
|
writer.add_page(page)
|
|
|
|
# Encrypt the PDF
|
|
writer.encrypt('password')
|
|
|
|
# Save the encrypted PDF
|
|
with open('encrypted_file.pdf', 'wb') as out:
|
|
writer.write(out)
|
|
```
|
|
|
|
## Extracting Information from PDFs
|
|
```python
|
|
from PyPDF2 import PdfReader
|
|
|
|
reader = PdfReader('path_to_file.pdf')
|
|
|
|
# Metadata
|
|
metadata = reader.metadata
|
|
print(metadata)
|
|
|
|
# Number of pages
|
|
print(len(reader.pages))
|
|
|
|
# Extract text from all pages
|
|
for page in reader.pages:
|
|
text = page.extract_text()
|
|
print(text)
|
|
```
|
|
|
|
This guide covers the basics of `PyPDF2` for PDF manipulation, including reading, merging, splitting, rotating, and encrypting PDF files. While `PyPDF2` is useful for these operations, it may have limitations with complex PDFs or specific PDF creation needs, for which other libraries like `ReportLab` might be more appropriate.
|
|
|
|
|
|
Keep in mind that `PyPDF2` focuses more on manipulating existing PDF files and might not support all features for complex PDF manipulation or creation from scratch. For more advanced PDF processing or creation, exploring other libraries or combining multiple libraries might be necessary to achieve desired outcomes. |