diff --git a/tech_docs/database/OLTPvsOLAP.md b/tech_docs/database/OLTPvsOLAP.md new file mode 100644 index 0000000..5ca6008 --- /dev/null +++ b/tech_docs/database/OLTPvsOLAP.md @@ -0,0 +1,120 @@ +Great idea! Let’s use the **library database** example for both **normalized (OLTP)** and **denormalized (OLAP)** models so we can compare them directly. + +--- + +### **Library Database Example** + +#### **1. Normalized (OLTP) – Optimized for Transactions** +**Goal:** Efficiently manage book checkouts, returns, and updates while minimizing redundancy. + +**Tables (3NF Structure):** +```plaintext +Books Table ++---------+------------+-----------+ +| BookID | Title | AuthorID | ++---------+------------+-----------+ + +Authors Table ++-----------+------------+ +| AuthorID | AuthorName | ++-----------+------------+ + +Borrowers Table ++-------------+--------------+ +| BorrowerID | BorrowerName | ++-------------+--------------+ + +Loans Table (Transactions) ++-----------+-------------+---------+------------+------------+ +| LoanID | BorrowerID | BookID | LoanDate | ReturnDate | ++-----------+-------------+---------+------------+------------+ +``` +**Why Normalized?** +- No duplicate data (e.g., author names stored only once). +- Easy to update (e.g., change an author’s name in one place). +- Optimized for **fast writes** (recording loans, returns). + +**Problem for Analytics:** +- To find _"How many books by Author X were borrowed last month?"_, you must **join 4 tables**: + ```sql + SELECT COUNT(*) + FROM Loans + JOIN Books ON Loans.BookID = Books.BookID + JOIN Authors ON Books.AuthorID = Authors.AuthorID + WHERE Authors.AuthorName = 'J.K. Rowling' + AND Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31'; + ``` + → Slow for large datasets. + +--- + +#### **2. Denormalized (OLAP) – Optimized for Analytics** +**Goal:** Speed up queries for reports like _"Top borrowed authors by month."_ + +**Star Schema Structure:** +```plaintext +Fact_Loans (Central Fact Table) ++--------+-------------+---------+-----------+------------+------------+ +| LoanID | BorrowerID | BookID | AuthorID | LoanDate | ReturnDate | ++--------+-------------+---------+-----------+------------+------------+ + +Dim_Books (Dimension Table) ++---------+------------+-----------------+ +| BookID | Title | Genre | ++---------+------------+-----------------+ + +Dim_Authors (Dimension Table) ++-----------+------------+----------------+ +| AuthorID | AuthorName | Nationality | ++-----------+------------+----------------+ + +Dim_Borrowers (Dimension Table) ++-------------+--------------+---------------+ +| BorrowerID | BorrowerName | MembershipTier| ++-------------+--------------+---------------+ + +Dim_Time (Dimension Table) ++------------+-------+-------+------+ +| LoanDate | Month | Quarter | Year| ++------------+-------+-------+------+ +``` +**Key Denormalizations:** +1. **Fact_Loans** includes **AuthorID** (redundant, since it’s also in `Books`). +2. **Dim_Time** pre-calculates month/quarter/year for faster filtering. +3. **No strict foreign keys** (optimized for reads, not updates). + +**Same Query, But Faster:** +```sql +SELECT COUNT(*) +FROM Fact_Loans +JOIN Dim_Authors ON Fact_Loans.AuthorID = Dim_Authors.AuthorID +WHERE Dim_Authors.AuthorName = 'J.K. Rowling' + AND Fact_Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31'; +``` +→ Fewer joins (just 2 tables) → **Better performance for analytics**. + +--- + +### **Side-by-Side Comparison (Library Example)** +| Feature | Normalized (OLTP) | Denormalized (OLAP) | +|-----------------------|-------------------------------------------|-----------------------------------------| +| **Structure** | 4+ tables (Books, Authors, Loans, etc.) | Star schema: 1 fact table + dimensions | +| **Author Data** | Only in `Authors` (linked via `Books`) | Duplicated in `Fact_Loans` for speed | +| **Query Complexity** | Needs multi-table joins | Fewer joins (optimized for reads) | +| **Update Efficiency** | Perfect (no redundancy) | Slow (batch updates, redundant data) | +| **Use Case** | Daily operations (checkouts, returns) | Monthly reports, trend analysis | + +--- + +### **Key Takeaways** +1. **Normalized (OLTP):** + - Like a **library’s backend system**—efficient for daily transactions. + - "Where is this book? Who borrowed it?" → Fast updates, strict integrity. + +2. **Denormalized (OLAP):** + - Like a **library’s annual report**—optimized for questions like: + - _"Which genre is most popular?"_ + - _"Do VIP members borrow more books?"_ + - Redundancy is **intentional** to avoid joins. + +Would you like to see how an **ETL pipeline** transforms the normalized library OLTP data into a denormalized OLAP star schema? \ No newline at end of file