diff --git a/tech_docs/pattern_matching.md b/tech_docs/pattern_matching.md new file mode 100644 index 0000000..cd3cf08 --- /dev/null +++ b/tech_docs/pattern_matching.md @@ -0,0 +1,139 @@ +# Pattern Matching + +--- + +### **Expanded Key Takeaways: Choosing the Right Tool for Pattern Matching** + +Regular expressions (regex) are powerful, but they’re not always the best tool for every text-processing task. Below is an **expanded breakdown** of when to use regex versus alternatives, along with context and real-world examples. + +--- + +## **1. Regex is Best for Medium-Complexity Text Patterns** +**Context**: +- Regex excels at flexible, rule-based matching (e.g., email validation, log filtering). +- It balances expressiveness and readability for moderately complex cases. + +**When to Use**: +✔ Extracting structured data (e.g., `\d{3}-\d{2}-\d{4}` for SSNs). +✔ Finding variable patterns (e.g., `https?://[^\s]+` for URLs). +✔ Replacing substrings following a rule (e.g., `s/\bcolour\b/color/g`). + +**Limitations**: +❌ Becomes unreadable for very complex rules (e.g., nested brackets). +❌ Poor at recursive patterns (e.g., matching nested HTML tags). + +**Example**: +```python +import re +# Extract phone numbers in format (XXX) XXX-XXXX +text = "Call (123) 456-7890 or (987) 654-3210" +phones = re.findall(r'\(\d{3}\) \d{3}-\d{4}', text) +# Result: ['(123) 456-7890', '(987) 654-3210'] +``` + +--- + +## **2. For Simple Tasks, Built-in String Methods Are Cleaner** +**Context**: +- If the task is **exact matching** or **fixed-format parsing**, avoid regex overhead. + +**When to Use**: +✔ Checking prefixes/suffixes (`str.startswith()`, `str.endswith()`). +✔ Exact substring search (`str.find()`, `str.contains()`). +✔ Splitting on fixed delimiters (`str.split(',')`). + +**Example**: +```python +# Check if a filename ends with .csv (simpler than regex) +filename = "data_2024.csv" +if filename.endswith(".csv"): + print("CSV file detected.") +``` + +--- + +## **3. For Recursive/Nested Patterns, Use Grammars or Parsers** +**Context**: +- Regex **cannot** handle recursive structures (e.g., JSON, XML, math expressions). +- **Formal grammars** (e.g., CFG) or **parser combinators** are needed. + +**When to Use**: +✔ Parsing programming languages. +✔ Extracting nested data (e.g., HTML/XML). +✔ Validating structured documents. + +**Example (Using `lxml` for HTML)**: +```python +from lxml import html +doc = html.fromstring("
Hello world