Files
the_information_nexus/tech_docs/database/OLTPvsOLAP.md

4.7 KiB
Raw Blame History

Great idea! Lets use the library database example for both normalized (OLTP) and denormalized (OLAP) models so we can compare them directly.


Library Database Example

1. Normalized (OLTP) Optimized for Transactions

Goal: Efficiently manage book checkouts, returns, and updates while minimizing redundancy.

Tables (3NF Structure):

Books Table  
+---------+------------+-----------+  
| BookID  | Title      | AuthorID  |  
+---------+------------+-----------+  

Authors Table  
+-----------+------------+  
| AuthorID  | AuthorName |  
+-----------+------------+  

Borrowers Table  
+-------------+--------------+  
| BorrowerID  | BorrowerName |  
+-------------+--------------+  

Loans Table (Transactions)  
+-----------+-------------+---------+------------+------------+  
| LoanID    | BorrowerID  | BookID  | LoanDate   | ReturnDate |  
+-----------+-------------+---------+------------+------------+  

Why Normalized?

  • No duplicate data (e.g., author names stored only once).
  • Easy to update (e.g., change an authors name in one place).
  • Optimized for fast writes (recording loans, returns).

Problem for Analytics:

  • To find "How many books by Author X were borrowed last month?", you must join 4 tables:
    SELECT COUNT(*) 
    FROM Loans  
    JOIN Books ON Loans.BookID = Books.BookID  
    JOIN Authors ON Books.AuthorID = Authors.AuthorID  
    WHERE Authors.AuthorName = 'J.K. Rowling' 
      AND Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';
    
    → Slow for large datasets.

2. Denormalized (OLAP) Optimized for Analytics

Goal: Speed up queries for reports like "Top borrowed authors by month."

Star Schema Structure:

Fact_Loans (Central Fact Table)  
+--------+-------------+---------+-----------+------------+------------+  
| LoanID | BorrowerID  | BookID  | AuthorID  | LoanDate   | ReturnDate |  
+--------+-------------+---------+-----------+------------+------------+  

Dim_Books (Dimension Table)  
+---------+------------+-----------------+  
| BookID  | Title      | Genre           |  
+---------+------------+-----------------+  

Dim_Authors (Dimension Table)  
+-----------+------------+----------------+  
| AuthorID  | AuthorName | Nationality    |  
+-----------+------------+----------------+  

Dim_Borrowers (Dimension Table)  
+-------------+--------------+---------------+  
| BorrowerID  | BorrowerName | MembershipTier|  
+-------------+--------------+---------------+  

Dim_Time (Dimension Table)  
+------------+-------+-------+------+  
| LoanDate   | Month | Quarter | Year|  
+------------+-------+-------+------+  

Key Denormalizations:

  1. Fact_Loans includes AuthorID (redundant, since its also in Books).
  2. Dim_Time pre-calculates month/quarter/year for faster filtering.
  3. No strict foreign keys (optimized for reads, not updates).

Same Query, But Faster:

SELECT COUNT(*)  
FROM Fact_Loans  
JOIN Dim_Authors ON Fact_Loans.AuthorID = Dim_Authors.AuthorID  
WHERE Dim_Authors.AuthorName = 'J.K. Rowling'  
  AND Fact_Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';

→ Fewer joins (just 2 tables) → Better performance for analytics.


Side-by-Side Comparison (Library Example)

Feature Normalized (OLTP) Denormalized (OLAP)
Structure 4+ tables (Books, Authors, Loans, etc.) Star schema: 1 fact table + dimensions
Author Data Only in Authors (linked via Books) Duplicated in Fact_Loans for speed
Query Complexity Needs multi-table joins Fewer joins (optimized for reads)
Update Efficiency Perfect (no redundancy) Slow (batch updates, redundant data)
Use Case Daily operations (checkouts, returns) Monthly reports, trend analysis

Key Takeaways

  1. Normalized (OLTP):

    • Like a librarys backend system—efficient for daily transactions.
    • "Where is this book? Who borrowed it?" → Fast updates, strict integrity.
  2. Denormalized (OLAP):

    • Like a librarys annual report—optimized for questions like:
      • "Which genre is most popular?"
      • "Do VIP members borrow more books?"
    • Redundancy is intentional to avoid joins.

Would you like to see how an ETL pipeline transforms the normalized library OLTP data into a denormalized OLAP star schema?