the_information_nexus/OLTPvsOLAP.md at 6b1c5dc652bc0658c119d4e3557ed939bcddfcff

Files

medusa 74233e4abe Add tech_docs/database/OLTPvsOLAP.md

2025-06-22 08:06:30 +00:00

4.7 KiB

Raw Blame History

Great idea! Let’s use the library database example for both normalized (OLTP) and denormalized (OLAP) models so we can compare them directly.

Library Database Example

1. Normalized (OLTP) – Optimized for Transactions

Goal: Efficiently manage book checkouts, returns, and updates while minimizing redundancy.

Tables (3NF Structure):

Books Table  
+---------+------------+-----------+  
| BookID  | Title      | AuthorID  |  
+---------+------------+-----------+  

Authors Table  
+-----------+------------+  
| AuthorID  | AuthorName |  
+-----------+------------+  

Borrowers Table  
+-------------+--------------+  
| BorrowerID  | BorrowerName |  
+-------------+--------------+  

Loans Table (Transactions)  
+-----------+-------------+---------+------------+------------+  
| LoanID    | BorrowerID  | BookID  | LoanDate   | ReturnDate |  
+-----------+-------------+---------+------------+------------+

Why Normalized?

No duplicate data (e.g., author names stored only once).
Easy to update (e.g., change an author’s name in one place).
Optimized for fast writes (recording loans, returns).

Problem for Analytics:

To find "How many books by Author X were borrowed last month?", you must join 4 tables:

SELECT COUNT(*) 
FROM Loans  
JOIN Books ON Loans.BookID = Books.BookID  
JOIN Authors ON Books.AuthorID = Authors.AuthorID  
WHERE Authors.AuthorName = 'J.K. Rowling' 
  AND Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';

→ Slow for large datasets.

2. Denormalized (OLAP) – Optimized for Analytics

Goal: Speed up queries for reports like "Top borrowed authors by month."

Star Schema Structure:

Fact_Loans (Central Fact Table)  
+--------+-------------+---------+-----------+------------+------------+  
| LoanID | BorrowerID  | BookID  | AuthorID  | LoanDate   | ReturnDate |  
+--------+-------------+---------+-----------+------------+------------+  

Dim_Books (Dimension Table)  
+---------+------------+-----------------+  
| BookID  | Title      | Genre           |  
+---------+------------+-----------------+  

Dim_Authors (Dimension Table)  
+-----------+------------+----------------+  
| AuthorID  | AuthorName | Nationality    |  
+-----------+------------+----------------+  

Dim_Borrowers (Dimension Table)  
+-------------+--------------+---------------+  
| BorrowerID  | BorrowerName | MembershipTier|  
+-------------+--------------+---------------+  

Dim_Time (Dimension Table)  
+------------+-------+-------+------+  
| LoanDate   | Month | Quarter | Year|  
+------------+-------+-------+------+

Key Denormalizations:

Fact_Loans includes AuthorID (redundant, since it’s also in Books).
Dim_Time pre-calculates month/quarter/year for faster filtering.
No strict foreign keys (optimized for reads, not updates).

Same Query, But Faster:

SELECT COUNT(*)  
FROM Fact_Loans  
JOIN Dim_Authors ON Fact_Loans.AuthorID = Dim_Authors.AuthorID  
WHERE Dim_Authors.AuthorName = 'J.K. Rowling'  
  AND Fact_Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';

→ Fewer joins (just 2 tables) → Better performance for analytics.

Side-by-Side Comparison (Library Example)

Feature	Normalized (OLTP)	Denormalized (OLAP)
Structure	4+ tables (Books, Authors, Loans, etc.)	Star schema: 1 fact table + dimensions
Author Data	Only in `Authors` (linked via `Books`)	Duplicated in `Fact_Loans` for speed
Query Complexity	Needs multi-table joins	Fewer joins (optimized for reads)
Update Efficiency	Perfect (no redundancy)	Slow (batch updates, redundant data)
Use Case	Daily operations (checkouts, returns)	Monthly reports, trend analysis

Key Takeaways

Normalized (OLTP):
- Like a library’s backend system—efficient for daily transactions.
- "Where is this book? Who borrowed it?" → Fast updates, strict integrity.
Denormalized (OLAP):
- Like a library’s annual report—optimized for questions like:
  - "Which genre is most popular?"
  - "Do VIP members borrow more books?"
- Redundancy is intentional to avoid joins.

Would you like to see how an ETL pipeline transforms the normalized library OLTP data into a denormalized OLAP star schema?

4.7 KiB Raw Blame History Unescape Escape