Add tech_docs/database/OLTPvsOLAP.md

2025-06-22 08:06:30 +00:00
parent fa80eca536
commit 74233e4abe
1 changed files with 120 additions and 0 deletions
--- a/tech_docs/database/OLTPvsOLAP.md
+++ b/tech_docs/database/OLTPvsOLAP.md
@@ -0,0 +1,120 @@
+Great idea! Let’s use the **library database** example for both **normalized (OLTP)** and **denormalized (OLAP)** models so we can compare them directly.  
+
+---
+
+### **Library Database Example**  
+
+#### **1. Normalized (OLTP) – Optimized for Transactions**  
+**Goal:** Efficiently manage book checkouts, returns, and updates while minimizing redundancy.  
+
+**Tables (3NF Structure):**  
+```plaintext
+Books Table  
+---------+------------+-----------+  
+| BookID  | Title      | AuthorID  |  
+---------+------------+-----------+  
+
+Authors Table  
+-----------+------------+  
+| AuthorID  | AuthorName |  
+-----------+------------+  
+
+Borrowers Table  
+-------------+--------------+  
+| BorrowerID  | BorrowerName |  
+-------------+--------------+  
+
+Loans Table (Transactions)  
+-----------+-------------+---------+------------+------------+  
+| LoanID    | BorrowerID  | BookID  | LoanDate   | ReturnDate |  
+-----------+-------------+---------+------------+------------+  
+```
+**Why Normalized?**  
+- No duplicate data (e.g., author names stored only once).  
+- Easy to update (e.g., change an author’s name in one place).  
+- Optimized for **fast writes** (recording loans, returns).  
+
+**Problem for Analytics:**  
+- To find _"How many books by Author X were borrowed last month?"_, you must **join 4 tables**:  
+  ```sql
+  SELECT COUNT(*) 
+  FROM Loans  
+  JOIN Books ON Loans.BookID = Books.BookID  
+  JOIN Authors ON Books.AuthorID = Authors.AuthorID  
+  WHERE Authors.AuthorName = 'J.K. Rowling' 
+    AND Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';
+  ```
+  → Slow for large datasets.  
+
+---
+
+#### **2. Denormalized (OLAP) – Optimized for Analytics**  
+**Goal:** Speed up queries for reports like _"Top borrowed authors by month."_  
+
+**Star Schema Structure:**  
+```plaintext
+Fact_Loans (Central Fact Table)  
+--------+-------------+---------+-----------+------------+------------+  
+| LoanID | BorrowerID  | BookID  | AuthorID  | LoanDate   | ReturnDate |  
+--------+-------------+---------+-----------+------------+------------+  
+
+Dim_Books (Dimension Table)  
+---------+------------+-----------------+  
+| BookID  | Title      | Genre           |  
+---------+------------+-----------------+  
+
+Dim_Authors (Dimension Table)  
+-----------+------------+----------------+  
+| AuthorID  | AuthorName | Nationality    |  
+-----------+------------+----------------+  
+
+Dim_Borrowers (Dimension Table)  
+-------------+--------------+---------------+  
+| BorrowerID  | BorrowerName | MembershipTier|  
+-------------+--------------+---------------+  
+
+Dim_Time (Dimension Table)  
+------------+-------+-------+------+  
+| LoanDate   | Month | Quarter | Year|  
+------------+-------+-------+------+  
+```
+**Key Denormalizations:**  
+1. **Fact_Loans** includes **AuthorID** (redundant, since it’s also in `Books`).  
+2. **Dim_Time** pre-calculates month/quarter/year for faster filtering.  
+3. **No strict foreign keys** (optimized for reads, not updates).  
+
+**Same Query, But Faster:**  
+```sql
+SELECT COUNT(*)  
+FROM Fact_Loans  
+JOIN Dim_Authors ON Fact_Loans.AuthorID = Dim_Authors.AuthorID  
+WHERE Dim_Authors.AuthorName = 'J.K. Rowling'  
+  AND Fact_Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';
+```
+→ Fewer joins (just 2 tables) → **Better performance for analytics**.  
+
+---
+
+### **Side-by-Side Comparison (Library Example)**  
+| Feature               | Normalized (OLTP)                          | Denormalized (OLAP)                     |  
+|-----------------------|-------------------------------------------|-----------------------------------------|  
+| **Structure**         | 4+ tables (Books, Authors, Loans, etc.)   | Star schema: 1 fact table + dimensions |  
+| **Author Data**       | Only in `Authors` (linked via `Books`)     | Duplicated in `Fact_Loans` for speed    |  
+| **Query Complexity**  | Needs multi-table joins                   | Fewer joins (optimized for reads)       |  
+| **Update Efficiency** | Perfect (no redundancy)                   | Slow (batch updates, redundant data)    |  
+| **Use Case**          | Daily operations (checkouts, returns)     | Monthly reports, trend analysis        |  
+
+---
+
+### **Key Takeaways**  
+1. **Normalized (OLTP):**  
+   - Like a **library’s backend system**—efficient for daily transactions.  
+   - "Where is this book? Who borrowed it?" → Fast updates, strict integrity.  
+
+2. **Denormalized (OLAP):**  
+   - Like a **library’s annual report**—optimized for questions like:  
+     - _"Which genre is most popular?"_  
+     - _"Do VIP members borrow more books?"_  
+   - Redundancy is **intentional** to avoid joins.  
+
+Would you like to see how an **ETL pipeline** transforms the normalized library OLTP data into a denormalized OLAP star schema?