Add tech_docs/database/OLTPvsOLAP.md
This commit is contained in:
120
tech_docs/database/OLTPvsOLAP.md
Normal file
120
tech_docs/database/OLTPvsOLAP.md
Normal file
@@ -0,0 +1,120 @@
|
|||||||
|
Great idea! Let’s use the **library database** example for both **normalized (OLTP)** and **denormalized (OLAP)** models so we can compare them directly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Library Database Example**
|
||||||
|
|
||||||
|
#### **1. Normalized (OLTP) – Optimized for Transactions**
|
||||||
|
**Goal:** Efficiently manage book checkouts, returns, and updates while minimizing redundancy.
|
||||||
|
|
||||||
|
**Tables (3NF Structure):**
|
||||||
|
```plaintext
|
||||||
|
Books Table
|
||||||
|
+---------+------------+-----------+
|
||||||
|
| BookID | Title | AuthorID |
|
||||||
|
+---------+------------+-----------+
|
||||||
|
|
||||||
|
Authors Table
|
||||||
|
+-----------+------------+
|
||||||
|
| AuthorID | AuthorName |
|
||||||
|
+-----------+------------+
|
||||||
|
|
||||||
|
Borrowers Table
|
||||||
|
+-------------+--------------+
|
||||||
|
| BorrowerID | BorrowerName |
|
||||||
|
+-------------+--------------+
|
||||||
|
|
||||||
|
Loans Table (Transactions)
|
||||||
|
+-----------+-------------+---------+------------+------------+
|
||||||
|
| LoanID | BorrowerID | BookID | LoanDate | ReturnDate |
|
||||||
|
+-----------+-------------+---------+------------+------------+
|
||||||
|
```
|
||||||
|
**Why Normalized?**
|
||||||
|
- No duplicate data (e.g., author names stored only once).
|
||||||
|
- Easy to update (e.g., change an author’s name in one place).
|
||||||
|
- Optimized for **fast writes** (recording loans, returns).
|
||||||
|
|
||||||
|
**Problem for Analytics:**
|
||||||
|
- To find _"How many books by Author X were borrowed last month?"_, you must **join 4 tables**:
|
||||||
|
```sql
|
||||||
|
SELECT COUNT(*)
|
||||||
|
FROM Loans
|
||||||
|
JOIN Books ON Loans.BookID = Books.BookID
|
||||||
|
JOIN Authors ON Books.AuthorID = Authors.AuthorID
|
||||||
|
WHERE Authors.AuthorName = 'J.K. Rowling'
|
||||||
|
AND Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';
|
||||||
|
```
|
||||||
|
→ Slow for large datasets.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
#### **2. Denormalized (OLAP) – Optimized for Analytics**
|
||||||
|
**Goal:** Speed up queries for reports like _"Top borrowed authors by month."_
|
||||||
|
|
||||||
|
**Star Schema Structure:**
|
||||||
|
```plaintext
|
||||||
|
Fact_Loans (Central Fact Table)
|
||||||
|
+--------+-------------+---------+-----------+------------+------------+
|
||||||
|
| LoanID | BorrowerID | BookID | AuthorID | LoanDate | ReturnDate |
|
||||||
|
+--------+-------------+---------+-----------+------------+------------+
|
||||||
|
|
||||||
|
Dim_Books (Dimension Table)
|
||||||
|
+---------+------------+-----------------+
|
||||||
|
| BookID | Title | Genre |
|
||||||
|
+---------+------------+-----------------+
|
||||||
|
|
||||||
|
Dim_Authors (Dimension Table)
|
||||||
|
+-----------+------------+----------------+
|
||||||
|
| AuthorID | AuthorName | Nationality |
|
||||||
|
+-----------+------------+----------------+
|
||||||
|
|
||||||
|
Dim_Borrowers (Dimension Table)
|
||||||
|
+-------------+--------------+---------------+
|
||||||
|
| BorrowerID | BorrowerName | MembershipTier|
|
||||||
|
+-------------+--------------+---------------+
|
||||||
|
|
||||||
|
Dim_Time (Dimension Table)
|
||||||
|
+------------+-------+-------+------+
|
||||||
|
| LoanDate | Month | Quarter | Year|
|
||||||
|
+------------+-------+-------+------+
|
||||||
|
```
|
||||||
|
**Key Denormalizations:**
|
||||||
|
1. **Fact_Loans** includes **AuthorID** (redundant, since it’s also in `Books`).
|
||||||
|
2. **Dim_Time** pre-calculates month/quarter/year for faster filtering.
|
||||||
|
3. **No strict foreign keys** (optimized for reads, not updates).
|
||||||
|
|
||||||
|
**Same Query, But Faster:**
|
||||||
|
```sql
|
||||||
|
SELECT COUNT(*)
|
||||||
|
FROM Fact_Loans
|
||||||
|
JOIN Dim_Authors ON Fact_Loans.AuthorID = Dim_Authors.AuthorID
|
||||||
|
WHERE Dim_Authors.AuthorName = 'J.K. Rowling'
|
||||||
|
AND Fact_Loans.LoanDate BETWEEN '2023-01-01' AND '2023-01-31';
|
||||||
|
```
|
||||||
|
→ Fewer joins (just 2 tables) → **Better performance for analytics**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Side-by-Side Comparison (Library Example)**
|
||||||
|
| Feature | Normalized (OLTP) | Denormalized (OLAP) |
|
||||||
|
|-----------------------|-------------------------------------------|-----------------------------------------|
|
||||||
|
| **Structure** | 4+ tables (Books, Authors, Loans, etc.) | Star schema: 1 fact table + dimensions |
|
||||||
|
| **Author Data** | Only in `Authors` (linked via `Books`) | Duplicated in `Fact_Loans` for speed |
|
||||||
|
| **Query Complexity** | Needs multi-table joins | Fewer joins (optimized for reads) |
|
||||||
|
| **Update Efficiency** | Perfect (no redundancy) | Slow (batch updates, redundant data) |
|
||||||
|
| **Use Case** | Daily operations (checkouts, returns) | Monthly reports, trend analysis |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### **Key Takeaways**
|
||||||
|
1. **Normalized (OLTP):**
|
||||||
|
- Like a **library’s backend system**—efficient for daily transactions.
|
||||||
|
- "Where is this book? Who borrowed it?" → Fast updates, strict integrity.
|
||||||
|
|
||||||
|
2. **Denormalized (OLAP):**
|
||||||
|
- Like a **library’s annual report**—optimized for questions like:
|
||||||
|
- _"Which genre is most popular?"_
|
||||||
|
- _"Do VIP members borrow more books?"_
|
||||||
|
- Redundancy is **intentional** to avoid joins.
|
||||||
|
|
||||||
|
Would you like to see how an **ETL pipeline** transforms the normalized library OLTP data into a denormalized OLAP star schema?
|
||||||
Reference in New Issue
Block a user