medusa/the_information_nexus

Fork 0

Files

medusa b4dd08b075 Update tech_docs/database/sql_guide.md

2025-06-30 08:30:53 +00:00

21 KiB

Raw Permalink Blame History

SQL Basics

1. Database and Table Structure

Database: A collection of related data organized for easy access and management.
Table: A structured set of data organized in rows and columns within a database.

2. Basic SQL Commands

SELECT: Used to fetch data from a database.
INSERT: Used to add new data into a table.
UPDATE: Used to modify existing data in a table.
DELETE: Used to remove data from a table.

Writing SQL Queries

1. SELECT Statement

The SELECT statement is used to retrieve data from a database.

SELECT column1, column2, ...
FROM table_name;

Example:

SELECT Name, Composer
FROM Track;

This query retrieves the Name and Composer columns from the Track table.

2. WHERE Clause

The WHERE clause is used to filter records.

SELECT column1, column2, ...
FROM table_name
WHERE condition;

Example:

SELECT Name, Composer
FROM Track
WHERE GenreId = 1;

This query retrieves the names and composers of tracks in the genre with GenreId 1.

3. INSERT INTO Statement

The INSERT INTO statement is used to add new records to a table.

INSERT INTO table_name (column1, column2, ...)
VALUES (value1, value2, ...);

Example:

INSERT INTO Artist (Name)
VALUES ('New Artist');

This query adds a new artist named 'New Artist' to the Artist table.

4. UPDATE Statement

The UPDATE statement is used to modify existing records.

UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;

Example:

UPDATE Track
SET Composer = 'New Composer'
WHERE TrackId = 1;

This query updates the composer of the track with TrackId 1 to 'New Composer'.

5. DELETE Statement

The DELETE statement is used to remove existing records from a table.

DELETE FROM table_name
WHERE condition;

Example:

DELETE FROM Artist
WHERE ArtistId = 1;

This query removes the artist with ArtistId 1.

Advanced SQL Concepts

1. JOINs

JOINs are used to combine rows from two or more tables based on a related column.

INNER JOIN: Returns records with matching values in both tables.

SELECT columns
FROM table1
INNER JOIN table2
ON table1.common_column = table2.common_column;

Example:

SELECT a.Name, t.Title
FROM Album a
INNER JOIN Artist ar ON a.ArtistId = ar.ArtistId
WHERE ar.Name = 'AC/DC';

This query returns the names of albums by the artist 'AC/DC'.

LEFT JOIN: Returns all records from the left table and the matched records from the right table.

SELECT columns
FROM table1
LEFT JOIN table2
ON table1.common_column = table2.common_column;

RIGHT JOIN: Returns all records from the right table and the matched records from the left table.

SELECT columns
FROM table1
RIGHT JOIN table2
ON table1.common_column = table2.common_column;

FULL OUTER JOIN: Returns all records when there is a match in either left or right table.

SELECT columns
FROM table1
FULL OUTER JOIN table2
ON table1.common_column = table2.common_column;

2. GROUP BY and HAVING

GROUP BY: Used to arrange identical data into groups.

SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1;

Example:

SELECT GenreId, COUNT(*)
FROM Track
GROUP BY GenreId;

This query counts the number of tracks in each genre.

HAVING: Used to filter groups.

SELECT column1, COUNT(*)
FROM table_name
GROUP BY column1
HAVING COUNT(*) > 1;

Example:

SELECT GenreId, COUNT(*)
FROM Track
GROUP BY GenreId
HAVING COUNT(*) > 10;

This query returns genres with more than 10 tracks.

3. ORDER BY

The ORDER BY statement is used to sort the result set.

SELECT column1, column2, ...
FROM table_name
ORDER BY column1 ASC|DESC;

Example:

SELECT Name, Composer
FROM Track
ORDER BY Name ASC;

This query retrieves track names and composers sorted by the track name in ascending order.

Practical Example

Consider the following database schema in the Chinook database:

Artist Table:

ArtistId	Name
1	AC/DC
2	Accept

Album Table:

AlbumId	Title	ArtistId
1	For Those...	1
2	Balls to...	2

Query Examples

Fetch all tracks in the 'Rock' genre:

SELECT Name, Composer
FROM Track
WHERE GenreId = 1;

Fetch all artists and the number of albums they have:

SELECT ar.Name, COUNT(al.AlbumId) AS album_count
FROM Artist ar
LEFT JOIN Album al ON ar.ArtistId = al.ArtistId
GROUP BY ar.Name;

This should give you a solid foundation to start with SQL using the Chinook database.

Complete Guide to Getting Started with SQL and SQLite3

Introduction to SQL

SQL (Structured Query Language) is the standard language for managing and manipulating relational databases. It is essential for anyone working in data-related fields. This guide will cover the basics of SQL and provide a comprehensive introduction to using SQLite3, a lightweight database engine.

What You Need to Know About SQL

1. Basic Concepts

Relational Databases: Understand what a relational database is and how data is organized into tables (relations).
Tables and Schemas: Know how to define and understand the schema of a database, including tables, columns, and data types.

2. SQL Syntax and Commands

Data Definition Language (DDL): Commands used to define the database structure:
- CREATE: Create tables and databases.
- ALTER: Modify existing database objects.
- DROP: Delete tables or databases.
Data Manipulation Language (DML): Commands for data manipulation:
- SELECT: Retrieve data from the database.
- INSERT: Add new data to the database.
- UPDATE: Modify existing data.
- DELETE: Remove data from the database.
Data Control Language (DCL): Commands for controlling access to data:
- GRANT: Give user access privileges.
- REVOKE: Remove user access privileges.

3. Querying Data

Basic Queries: Writing simple queries to retrieve data using SELECT statements.
Filtering Data: Using WHERE clauses to filter data.
Sorting Data: Using ORDER BY to sort data.
Aggregate Functions: Using functions like COUNT(), SUM(), AVG(), MIN(), and MAX() to perform calculations on data.

4. Advanced Querying

Joins: Combining data from multiple tables using various types of joins (INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL JOIN).
Subqueries: Writing queries within queries to perform complex operations.
Grouping and Aggregation: Using GROUP BY to group data and aggregate functions to summarize data.
Window Functions: Performing calculations across a set of table rows related to the current row.

5. Database Design

Normalization: Understanding normalization rules to design efficient and consistent databases.
Indexes: Creating and using indexes to improve query performance.
Constraints: Applying constraints like PRIMARY KEY, FOREIGN KEY, UNIQUE, NOT NULL, and CHECK to maintain data integrity.

6. Performance Tuning

Query Optimization: Techniques for writing efficient queries and understanding execution plans.
Indexes and Partitions: Using indexes and partitions to optimize database performance.
Database Maintenance: Regular tasks like backups, indexing, and updating statistics to keep the database running smoothly.

7. SQL in Practice

Real-World Scenarios: Applying SQL skills to real-world scenarios, such as reporting, data analysis, and integrating with applications.
SQL Tools: Familiarity with SQL-based tools and environments (e.g., MySQL Workbench, pgAdmin, SQL Server Management Studio).

8. Continuous Learning

Advanced SQL: Exploring advanced topics like stored procedures, triggers, and advanced data types.
New SQL Technologies: Keeping up with new developments in SQL databases and related technologies.

Practical Tips

Practice Regularly: Regular practice is crucial. Use platforms like LeetCode, HackerRank, and SQLZoo for SQL exercises.
Project-Based Learning: Work on real projects that involve database design, querying, and optimization.
Stay Updated: Follow SQL-related blogs, forums, and documentation to stay updated on best practices and new features.

By mastering these aspects of SQL, you'll be well-equipped to handle data-related tasks and challenges in various roles within the data industry.

Getting Started with SQLite3

SQLite3 is a lightweight, self-contained, serverless database engine that is easy to set up and use. Here’s a guide to get you started with SQLite3:

1. Installation

SQLite3 often comes pre-installed on many systems. You can check if it is installed by running:

sqlite3 --version

If it’s not installed, you can install it using the following commands:

For Debian/Ubuntu:

sudo apt-get update
sudo apt-get install sqlite3

For macOS:

brew install sqlite3

2. Creating a Database

You can create a new SQLite database by simply opening SQLite with a file name:

sqlite3 mydatabase.db

This command creates a new file named mydatabase.db if it does not exist and opens the SQLite prompt.

3. Basic Commands

Here are some basic commands to get you started:

Creating a Table:

CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    age INTEGER
);

Inserting Data:

INSERT INTO users (name, age) VALUES ('Alice', 30);
INSERT INTO users (name, age) VALUES ('Bob', 25);

Querying Data:

SELECT * FROM users;

Updating Data:

UPDATE users SET age = 31 WHERE name = 'Alice';

Deleting Data:

DELETE FROM users WHERE name = 'Bob';

4. Using SQLite3 with Python

SQLite3 can be used directly within Python using the sqlite3 module. Here’s a quick example:

Connecting to the Database:

import sqlite3

# Connect to database (or create it if it doesn't exist)
conn = sqlite3.connect('mydatabase.db')

# Create a cursor object
cur = conn.cursor()

Creating a Table:

cur.execute('''
CREATE TABLE users (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL,
    age INTEGER
)
''')

# Commit the changes
conn.commit()

Inserting Data:

cur.execute('''
INSERT INTO users (name, age) VALUES (?, ?)
''', ('Alice', 30))

cur.execute('''
INSERT INTO users (name, age) VALUES (?, ?)
''', ('Bob', 25))

# Commit the changes
conn.commit()

Querying Data:

cur.execute('SELECT * FROM users')
rows = cur.fetchall()

for row in rows:
    print(row)

Updating Data:

cur.execute('''
UPDATE users SET age = ? WHERE name = ?
''', (31, 'Alice'))

# Commit the changes
conn.commit()

Deleting Data:

cur.execute('''
DELETE FROM users WHERE name = ?
''', ('Bob',))

# Commit the changes
conn.commit()

Closing the Connection:

# Close the cursor and connection
cur.close()
conn.close()

5. Additional Resources

SQLite Official Documentation: https://sqlite.org/docs.html
SQLite Tutorial: https://www.sqlitetutorial.net/

By following these steps and utilizing the resources mentioned, you'll be well on your way to mastering SQLite3 for your data projects.

Here's a comprehensive SQL coding standards document, aligned with the principles and structure we've used for HTML, Bash, and Python.

SQL Coding Standards Documentation

Introduction

This document outlines our conventions and best practices for writing SQL (Structured Query Language) queries, stored procedures, functions, and database schemas. Adhering to these standards ensures our SQL code is consistent, readable, maintainable, and performs optimally. This approach fosters seamless team collaboration, reduces debugging time, and improves overall database reliability and longevity.

Formatting and Readability

Consistent formatting is paramount for understanding complex SQL statements.

1. Indentation

Spaces over Tabs: Use 4 spaces for indentation. Configure your editor to convert tabs to spaces.

Logical Blocks: Indent clauses within statements (e.g., SELECT lists, JOIN conditions, WHERE clauses) to show hierarchy and structure.

Usage:

SELECT
    CustomerID,
    FirstName,
    LastName,
    OrderDate
FROM
    Customers c
JOIN
    Orders o ON c.CustomerID = o.CustomerID
WHERE
    OrderDate >= '2024-01-01'
    AND TotalAmount > 100
ORDER BY
    OrderDate DESC;

2. Capitalization

Keywords Uppercase: SQL keywords (e.g., SELECT, FROM, WHERE, JOIN, AND, OR, CREATE, TABLE, INSERT, UPDATE, DELETE) should be in UPPERCASE.
Identifiers (Tables, Columns, Aliases) Consistent Case: Use a consistent case (e.g., PascalCase or snake_case) for table names, column names, and aliases. This document will primarily use PascalCase for table/column names in examples, but consistency is key within a project.
- Usage:
```
SELECT
    ProductName,
    UnitPrice AS Price
FROM
    Products;
```

3. Line Breaks and Spacing

One Clause Per Line: Generally, place each major clause (SELECT, FROM, WHERE, GROUP BY, ORDER BY, LIMIT/TOP) on a new line.
Commas: In SELECT lists, place the comma before the column name on the subsequent line.
- Benefit: Makes it easy to comment out or reorder columns without dealing with trailing commas.
- Usage:
```
SELECT
    CustomerID
    , FirstName
    , LastName
FROM
    Customers;
```
Operators: Use spaces around operators (=, +, >, AND, OR, IN, LIKE).
- Usage: WHERE Price > 100, JOIN Table2 ON T1.ID = T2.ID

Naming Conventions

Consistent naming makes database objects and their components easy to identify and understand.

1. Tables

Plural Nouns: Use plural nouns.
PascalCase (preferred) or snake_case: Be consistent. PascalCase (e.g., OrderItems) or snake_case (e.g., order_items).
Avoid Prefixes: Avoid redundant prefixes like tbl_.
- Usage: Customers, Products, OrderItems

2. Columns

Singular Nouns: Use singular nouns.
PascalCase (preferred) or snake_case: Be consistent with table naming.
Primary Keys: Suffix primary key columns with ID (e.g., CustomerID, ProductID).
Foreign Keys: Suffix foreign key columns with ID and prefix with the related table's singular name (e.g., CustomerID in Orders table, referencing Customers.CustomerID).
- Usage: FirstName, OrderDate, CustomerID, ProductID

3. Stored Procedures and Functions

Verb-Noun Structure: Use a verb indicating the action, followed by a noun indicating the object.
Prefixes: Use usp_ for stored procedures (User Stored Procedure) and ufn_ for user-defined functions.
PascalCase (preferred) or snake_case: Consistent with other naming.
- Usage: usp_GetCustomerOrders, ufn_CalculateTotalSales

4. Views

Prefixes: Use vw_ for views.
PascalCase (preferred) or snake_case: Consistent.
Descriptive Nouns: Name should describe the data the view represents.
- Usage: vw_ActiveCustomers, vw_ProductSalesSummary

5. Indexes

Prefixes: Use IX_ for non-clustered indexes, PK_ for primary key constraints (which often create clustered indexes), UQ_ for unique constraints.
Naming Convention: [Prefix]_[TableName]_[ColumnNames]
- Usage: IX_Orders_OrderDate, PK_Customers_CustomerID, UQ_Products_SKU

Comments

Good commenting explains the "why" and complex parts of your SQL.

Block Comments (/* ... */):
- Description: Use for multi-line comments or for commenting out larger blocks of code. Ideal for script headers, complex logic explanations, or temporary disabling of code.
- Usage:
```
/*
This query retrieves all active customers
who have placed an order in the last 30 days.
It joins Customers and Orders tables.
*/
SELECT ...
```

Single-Line Comments (--):

Description: Use for single-line explanations or inline comments. Explain why a specific condition or calculation is used.

Usage:

SELECT
    ProductName,
    (UnitPrice * Quantity) AS LineTotal -- Calculate the total for each line item
FROM
    OrderItems
WHERE
    OrderDate >= GETDATE() - 30; -- Only consider orders from the last 30 days

Best Practices and Principles

These principles guide our overall approach to writing effective, performant, and maintainable SQL code.

1. Simplicity and Clarity

Break Down Complex Queries: For very complex queries, consider breaking them down using Common Table Expressions (CTEs) (WITH ... AS (...)) or views to improve readability.
Avoid SELECT *: Always explicitly list the columns you need.
- Benefit: Prevents fetching unnecessary data, makes the query intent clear, protects against schema changes (e.g., new columns breaking application code), and can improve performance.
- Usage: SELECT CustomerID, FirstName, LastName FROM Customers;

2. Performance Considerations

Use Joins Appropriately: Understand the differences between INNER JOIN, LEFT JOIN, RIGHT JOIN, FULL OUTER JOIN. Choose the most appropriate join type based on your data relationships.
Index Usage: Design and use indexes strategically to improve query performance on frequently queried columns (especially in WHERE, JOIN, ORDER BY, GROUP BY clauses). However, avoid over-indexing, as it can degrade write performance.
Avoid Functions in WHERE Clauses: Applying functions to columns in WHERE or JOIN clauses (e.g., WHERE YEAR(OrderDate) = 2024) can prevent the database from using indexes on those columns.
- Instead: Use range conditions (e.g., WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01').
LIMIT/TOP for Paging: Use LIMIT (MySQL, PostgreSQL) or TOP (SQL Server) for efficient paging and to prevent fetching excessive data.

3. Data Integrity and Constraints

Primary Keys: Every table should have a primary key to uniquely identify rows.
Foreign Keys: Use foreign keys to enforce referential integrity between tables.
Constraints: Implement NOT NULL, UNIQUE, CHECK constraints to enforce data validity rules at the database level.
- Benefit: Ensures data quality and consistency, simplifies application logic.

4. Transactions

ACID Properties: For operations involving multiple DML (Data Manipulation Language) statements (e.g., INSERT, UPDATE, DELETE) that must succeed or fail as a single unit, use transactions (BEGIN TRANSACTION, COMMIT, ROLLBACK).

Benefit: Ensures Atomicity, Consistency, Isolation, Durability (ACID) for database operations.

Usage:

BEGIN TRANSACTION;

INSERT INTO Orders (CustomerID, OrderDate, TotalAmount) VALUES (1, GETDATE(), 150.00);
INSERT INTO OrderItems (OrderID, ProductID, Quantity, UnitPrice) VALUES (SCOPE_IDENTITY(), 101, 1, 150.00); -- Example for SQL Server

-- Or for PostgreSQL/MySQL:
-- INSERT INTO OrderItems (OrderID, ProductID, Quantity, UnitPrice) VALUES (LAST_INSERT_ID(), 101, 1, 150.00);

IF @@ERROR <> 0
BEGIN
    ROLLBACK TRANSACTION;
    PRINT 'Transaction failed!';
END
ELSE
BEGIN
    COMMIT TRANSACTION;
    PRINT 'Transaction committed successfully.';
END;

5. Security

Least Privilege Principle: Grant users and applications only the minimum necessary permissions on database objects.
Avoid Dynamic SQL Injection: Be extremely cautious when constructing SQL queries dynamically based on user input. Use parameterized queries or stored procedures to prevent SQL injection attacks.
- Usage (Conceptual - depends on client library/ORM):
```
-- BAD (Vulnerable to SQL Injection):
-- query = "SELECT * FROM Users WHERE Username = '" + username + "'"

-- GOOD (Parameterized Query):
-- query = "SELECT * FROM Users WHERE Username = %s"
-- cursor.execute(query, (username,))
```

Conclusion

By diligently applying these SQL coding standards, we ensure our database interactions are not only functional but also highly readable, maintainable, and performant. This commitment to quality coding practices promotes a more efficient development workflow, reduces technical debt, and strengthens the overall reliability and security of our data systems.

21 KiB Raw Permalink Blame History Unescape Escape

SQL Basics

1. Database and Table Structure

2. Basic SQL Commands

Writing SQL Queries

1. SELECT Statement

2. WHERE Clause

3. INSERT INTO Statement

4. UPDATE Statement

5. DELETE Statement

Advanced SQL Concepts

1. JOINs

2. GROUP BY and HAVING

3. ORDER BY

Practical Example

Query Examples

Complete Guide to Getting Started with SQL and SQLite3

Introduction to SQL

What You Need to Know About SQL

1. Basic Concepts

2. SQL Syntax and Commands

3. Querying Data

4. Advanced Querying

5. Database Design

6. Performance Tuning

7. SQL in Practice

8. Continuous Learning

Practical Tips

Getting Started with SQLite3

1. Installation

2. Creating a Database

3. Basic Commands

4. Using SQLite3 with Python

5. Additional Resources

SQL Coding Standards Documentation

Introduction

Formatting and Readability

1. Indentation

2. Capitalization

3. Line Breaks and Spacing

Naming Conventions

1. Tables

2. Columns

3. Stored Procedures and Functions

4. Views

5. Indexes

Comments

Best Practices and Principles

1. Simplicity and Clarity

2. Performance Considerations

3. Data Integrity and Constraints

4. Transactions

5. Security

Conclusion

21 KiB

Raw Permalink Blame History