For working with PDF files in Python, the `PyPDF2` library is a commonly used tool. It allows you to split, merge, rotate, and encrypt PDFs, among other functionalities. However, it's worth noting that `PyPDF2` primarily deals with manipulating existing PDFs rather than creating new ones from scratch. For more advanced PDF creation and manipulation, libraries such as `ReportLab` might be more suitable. Below is a concise reference guide for common use cases with `PyPDF2`, formatted in Markdown syntax: # `PyPDF2` Reference Guide ## Installation ``` pip install PyPDF2 ``` ## Reading PDFs ```python from PyPDF2 import PdfReader # Open a PDF file reader = PdfReader('path_to_file.pdf') # Get the number of pages num_pages = len(reader.pages) # Read a specific page's text page = reader.pages[0] text = page.extract_text() print(text) ``` ## Merging PDFs ```python from PyPDF2 import PdfReader, PdfWriter # Create a PDF writer object writer = PdfWriter() # Open the first PDF reader1 = PdfReader('path_to_first_file.pdf') # Add all its pages to the writer for page in reader1.pages: writer.add_page(page) # Open the second PDF reader2 = PdfReader('path_to_second_file.pdf') # Add all its pages to the writer for page in reader2.pages: writer.add_page(page) # Write out the merged PDF with open('merged_file.pdf', 'wb') as out: writer.write(out) ``` ## Splitting PDFs ```python from PyPDF2 import PdfReader, PdfWriter # Function to split a PDF into one PDF per page def split_pdf(path): reader = PdfReader(path) for page_number in range(len(reader.pages)): writer = PdfWriter() writer.add_page(reader.pages[page_number]) output_filename = f'page_{page_number+1}.pdf' with open(output_filename, 'wb') as output_pdf: writer.write(output_pdf) split_pdf('path_to_file.pdf') ``` ## Rotating Pages ```python from PyPDF2 import PdfReader, PdfWriter # Open the PDF reader = PdfReader('path_to_file.pdf') writer = PdfWriter() # Rotate the first page by 90 degrees page = reader.pages[0].rotate(90) # Rotate clockwise writer.add_page(page) # Add the rest of the pages without rotation for page in reader.pages[1:]: writer.add_page(page) # Save the rotated PDF with open('rotated_file.pdf', 'wb') as out: writer.write(out) ``` ## Encrypting PDFs ```python from PyPDF2 import PdfReader, PdfWriter # Open the PDF reader = PdfReader('path_to_file.pdf') writer = PdfWriter() # Add all pages to the writer for page in reader.pages: writer.add_page(page) # Encrypt the PDF writer.encrypt('password') # Save the encrypted PDF with open('encrypted_file.pdf', 'wb') as out: writer.write(out) ``` ## Extracting Information from PDFs ```python from PyPDF2 import PdfReader reader = PdfReader('path_to_file.pdf') # Metadata metadata = reader.metadata print(metadata) # Number of pages print(len(reader.pages)) # Extract text from all pages for page in reader.pages: text = page.extract_text() print(text) ``` This guide covers the basics of `PyPDF2` for PDF manipulation, including reading, merging, splitting, rotating, and encrypting PDF files. While `PyPDF2` is useful for these operations, it may have limitations with complex PDFs or specific PDF creation needs, for which other libraries like `ReportLab` might be more appropriate. Keep in mind that `PyPDF2` focuses more on manipulating existing PDF files and might not support all features for complex PDF manipulation or creation from scratch. For more advanced PDF processing or creation, exploring other libraries or combining multiple libraries might be necessary to achieve desired outcomes.