Files

120 lines
4.7 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Great idea! Lets use the **library database** example for both **normalized (OLTP)** and **denormalized (OLAP)** models so we can compare them directly.
---
### **Library Database Example**
#### **1. Normalized (OLTP) Optimized for Transactions**
**Goal:** Efficiently manage book checkouts, returns, and updates while minimizing redundancy.
**Tables (3NF Structure):**
```plaintext
Books Table
+---------+------------+-----------+
| BookID | Title | AuthorID |
+---------+------------+-----------+
Authors Table
+-----------+------------+
| AuthorID | AuthorName |
+-----------+------------+
Borrowers Table
+-------------+--------------+
| BorrowerID | BorrowerName |
+-------------+--------------+
Loans Table (Transactions)
+-----------+-------------+---------+------------+------------+
| LoanID | BorrowerID | BookID | LoanDate | ReturnDate |
+-----------+-------------+---------+------------+------------+
```
**Why Normalized?**
- No duplicate data (e.g., author names stored only once).
- Easy to update (e.g., change an authors name in one place).
- Optimized for **fast writes** (recording loans, returns).
**Problem for Analytics:**
- To find _"How many books by Author X were borrowed last month?"_, you must **join 4 tables**:
```sql
SELECT COUNT(*)
FROM Loans
JOIN Books ON Loans.BookID = Books.BookID
JOIN Authors ON Books.AuthorID = Authors.AuthorID
WHERE Authors.AuthorName = 'J.K. Rowling'
AND Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';
```
→ Slow for large datasets.
---
#### **2. Denormalized (OLAP) Optimized for Analytics**
**Goal:** Speed up queries for reports like _"Top borrowed authors by month."_
**Star Schema Structure:**
```plaintext
Fact_Loans (Central Fact Table)
+--------+-------------+---------+-----------+------------+------------+
| LoanID | BorrowerID | BookID | AuthorID | LoanDate | ReturnDate |
+--------+-------------+---------+-----------+------------+------------+
Dim_Books (Dimension Table)
+---------+------------+-----------------+
| BookID | Title | Genre |
+---------+------------+-----------------+
Dim_Authors (Dimension Table)
+-----------+------------+----------------+
| AuthorID | AuthorName | Nationality |
+-----------+------------+----------------+
Dim_Borrowers (Dimension Table)
+-------------+--------------+---------------+
| BorrowerID | BorrowerName | MembershipTier|
+-------------+--------------+---------------+
Dim_Time (Dimension Table)
+------------+-------+-------+------+
| LoanDate | Month | Quarter | Year|
+------------+-------+-------+------+
```
**Key Denormalizations:**
1. **Fact_Loans** includes **AuthorID** (redundant, since its also in `Books`).
2. **Dim_Time** pre-calculates month/quarter/year for faster filtering.
3. **No strict foreign keys** (optimized for reads, not updates).
**Same Query, But Faster:**
```sql
SELECT COUNT(*)
FROM Fact_Loans
JOIN Dim_Authors ON Fact_Loans.AuthorID = Dim_Authors.AuthorID
WHERE Dim_Authors.AuthorName = 'J.K. Rowling'
AND Fact_Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';
```
→ Fewer joins (just 2 tables) → **Better performance for analytics**.
---
### **Side-by-Side Comparison (Library Example)**
| Feature | Normalized (OLTP) | Denormalized (OLAP) |
|-----------------------|-------------------------------------------|-----------------------------------------|
| **Structure** | 4+ tables (Books, Authors, Loans, etc.) | Star schema: 1 fact table + dimensions |
| **Author Data** | Only in `Authors` (linked via `Books`) | Duplicated in `Fact_Loans` for speed |
| **Query Complexity** | Needs multi-table joins | Fewer joins (optimized for reads) |
| **Update Efficiency** | Perfect (no redundancy) | Slow (batch updates, redundant data) |
| **Use Case** | Daily operations (checkouts, returns) | Monthly reports, trend analysis |
---
### **Key Takeaways**
1. **Normalized (OLTP):**
- Like a **librarys backend system**—efficient for daily transactions.
- "Where is this book? Who borrowed it?" → Fast updates, strict integrity.
2. **Denormalized (OLAP):**
- Like a **librarys annual report**—optimized for questions like:
- _"Which genre is most popular?"_
- _"Do VIP members borrow more books?"_
- Redundancy is **intentional** to avoid joins.
Would you like to see how an **ETL pipeline** transforms the normalized library OLTP data into a denormalized OLAP star schema?