4.7 KiB
4.7 KiB
Great idea! Let’s use the library database example for both normalized (OLTP) and denormalized (OLAP) models so we can compare them directly.
Library Database Example
1. Normalized (OLTP) – Optimized for Transactions
Goal: Efficiently manage book checkouts, returns, and updates while minimizing redundancy.
Tables (3NF Structure):
Books Table
+---------+------------+-----------+
| BookID | Title | AuthorID |
+---------+------------+-----------+
Authors Table
+-----------+------------+
| AuthorID | AuthorName |
+-----------+------------+
Borrowers Table
+-------------+--------------+
| BorrowerID | BorrowerName |
+-------------+--------------+
Loans Table (Transactions)
+-----------+-------------+---------+------------+------------+
| LoanID | BorrowerID | BookID | LoanDate | ReturnDate |
+-----------+-------------+---------+------------+------------+
Why Normalized?
- No duplicate data (e.g., author names stored only once).
- Easy to update (e.g., change an author’s name in one place).
- Optimized for fast writes (recording loans, returns).
Problem for Analytics:
- To find "How many books by Author X were borrowed last month?", you must join 4 tables:
→ Slow for large datasets.
SELECT COUNT(*) FROM Loans JOIN Books ON Loans.BookID = Books.BookID JOIN Authors ON Books.AuthorID = Authors.AuthorID WHERE Authors.AuthorName = 'J.K. Rowling' AND Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';
2. Denormalized (OLAP) – Optimized for Analytics
Goal: Speed up queries for reports like "Top borrowed authors by month."
Star Schema Structure:
Fact_Loans (Central Fact Table)
+--------+-------------+---------+-----------+------------+------------+
| LoanID | BorrowerID | BookID | AuthorID | LoanDate | ReturnDate |
+--------+-------------+---------+-----------+------------+------------+
Dim_Books (Dimension Table)
+---------+------------+-----------------+
| BookID | Title | Genre |
+---------+------------+-----------------+
Dim_Authors (Dimension Table)
+-----------+------------+----------------+
| AuthorID | AuthorName | Nationality |
+-----------+------------+----------------+
Dim_Borrowers (Dimension Table)
+-------------+--------------+---------------+
| BorrowerID | BorrowerName | MembershipTier|
+-------------+--------------+---------------+
Dim_Time (Dimension Table)
+------------+-------+-------+------+
| LoanDate | Month | Quarter | Year|
+------------+-------+-------+------+
Key Denormalizations:
- Fact_Loans includes AuthorID (redundant, since it’s also in
Books). - Dim_Time pre-calculates month/quarter/year for faster filtering.
- No strict foreign keys (optimized for reads, not updates).
Same Query, But Faster:
SELECT COUNT(*)
FROM Fact_Loans
JOIN Dim_Authors ON Fact_Loans.AuthorID = Dim_Authors.AuthorID
WHERE Dim_Authors.AuthorName = 'J.K. Rowling'
AND Fact_Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';
→ Fewer joins (just 2 tables) → Better performance for analytics.
Side-by-Side Comparison (Library Example)
| Feature | Normalized (OLTP) | Denormalized (OLAP) |
|---|---|---|
| Structure | 4+ tables (Books, Authors, Loans, etc.) | Star schema: 1 fact table + dimensions |
| Author Data | Only in Authors (linked via Books) |
Duplicated in Fact_Loans for speed |
| Query Complexity | Needs multi-table joins | Fewer joins (optimized for reads) |
| Update Efficiency | Perfect (no redundancy) | Slow (batch updates, redundant data) |
| Use Case | Daily operations (checkouts, returns) | Monthly reports, trend analysis |
Key Takeaways
-
Normalized (OLTP):
- Like a library’s backend system—efficient for daily transactions.
- "Where is this book? Who borrowed it?" → Fast updates, strict integrity.
-
Denormalized (OLAP):
- Like a library’s annual report—optimized for questions like:
- "Which genre is most popular?"
- "Do VIP members borrow more books?"
- Redundancy is intentional to avoid joins.
- Like a library’s annual report—optimized for questions like:
Would you like to see how an ETL pipeline transforms the normalized library OLTP data into a denormalized OLAP star schema?