373 lines
10 KiB
Markdown
373 lines
10 KiB
Markdown
To start with connecting to the YouTube Music API and downloading your playlist data using `curl` and storing this information in a `sqlite3` database, we'll break this task into stages. We'll focus on using the YouTube Data API (which supports YouTube Music data) for authentication and data fetching.
|
||
|
||
### Stage 1: Setup and API Authentication
|
||
|
||
#### 1.1 Create a Project and Enable YouTube Data API
|
||
|
||
1. Go to the [Google Cloud Console](https://console.developers.google.com/).
|
||
2. Create a new project.
|
||
3. Enable the YouTube Data API v3 for your project.
|
||
4. Create OAuth 2.0 credentials for your project and download the JSON file.
|
||
|
||
#### 1.2 Using `curl` to Connect to the API
|
||
|
||
First, you'll need to authenticate with OAuth 2.0. Here is a simple way to get an access token:
|
||
|
||
1. **Request User Authorization**
|
||
|
||
Open a browser and navigate to the following URL, replacing `YOUR_CLIENT_ID` and `YOUR_REDIRECT_URI` with your OAuth 2.0 Client ID and Redirect URI:
|
||
```
|
||
https://accounts.google.com/o/oauth2/v2/auth?scope=https://www.googleapis.com/auth/youtube.readonly&access_type=offline&include_granted_scopes=true&response_type=code&client_id=YOUR_CLIENT_ID&redirect_uri=YOUR_REDIRECT_URI
|
||
```
|
||
|
||
After the user grants permission, Google will redirect to the specified `redirect_uri` with a `code` query parameter.
|
||
|
||
2. **Exchange Authorization Code for Access Token**
|
||
|
||
Use `curl` to exchange the authorization code for an access token:
|
||
|
||
```bash
|
||
curl \
|
||
-d "code=YOUR_AUTH_CODE" \
|
||
-d "client_id=YOUR_CLIENT_ID" \
|
||
-d "client_secret=YOUR_CLIENT_SECRET" \
|
||
-d "redirect_uri=YOUR_REDIRECT_URI" \
|
||
-d "grant_type=authorization_code" \
|
||
https://oauth2.googleapis.com/token
|
||
```
|
||
|
||
This will return a JSON response with the `access_token` and `refresh_token`.
|
||
|
||
#### 1.3 Fetch Playlist Data
|
||
|
||
Now that you have the access token, you can fetch your playlists:
|
||
|
||
```bash
|
||
curl \
|
||
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
|
||
"https://www.googleapis.com/youtube/v3/playlists?part=snippet&mine=true"
|
||
```
|
||
|
||
### Stage 2: Store Data in SQLite
|
||
|
||
Let's create a Python script to fetch the data using the YouTube Data API and store it in a SQLite database.
|
||
|
||
#### 2.1 Install Required Packages
|
||
|
||
```bash
|
||
pip install requests sqlite3
|
||
```
|
||
|
||
#### 2.2 Create Python Script
|
||
|
||
Create a script `fetch_and_store.py`:
|
||
|
||
```python
|
||
import requests
|
||
import sqlite3
|
||
import json
|
||
|
||
# Replace with your actual access token
|
||
ACCESS_TOKEN = 'YOUR_ACCESS_TOKEN'
|
||
|
||
# Fetch playlists
|
||
response = requests.get(
|
||
'https://www.googleapis.com/youtube/v3/playlists?part=snippet&mine=true',
|
||
headers={'Authorization': f'Bearer {ACCESS_TOKEN}'}
|
||
)
|
||
playlists = response.json()
|
||
|
||
# Connect to SQLite database
|
||
conn = sqlite3.connect('youtube_music.db')
|
||
c = conn.cursor()
|
||
|
||
# Create table for playlists
|
||
c.execute('''
|
||
CREATE TABLE IF NOT EXISTS playlists (
|
||
id TEXT PRIMARY KEY,
|
||
title TEXT,
|
||
description TEXT,
|
||
published_at TEXT
|
||
)
|
||
''')
|
||
|
||
# Insert playlists into the database
|
||
for item in playlists['items']:
|
||
c.execute('''
|
||
INSERT OR REPLACE INTO playlists (id, title, description, published_at)
|
||
VALUES (?, ?, ?, ?)
|
||
''', (item['id'], item['snippet']['title'], item['snippet']['description'], item['snippet']['publishedAt']))
|
||
|
||
# Commit and close the connection
|
||
conn.commit()
|
||
conn.close()
|
||
|
||
print("Playlists have been successfully saved to the database.")
|
||
```
|
||
|
||
### Stage 3: Fetching More Data and Analyzing
|
||
|
||
#### 3.1 Fetch Playlist Items
|
||
|
||
Update the script to fetch and store playlist items:
|
||
|
||
```python
|
||
# Fetch playlist items
|
||
playlist_id = 'YOUR_PLAYLIST_ID'
|
||
response = requests.get(
|
||
f'https://www.googleapis.com/youtube/v3/playlistItems?part=snippet&playlistId={playlist_id}',
|
||
headers={'Authorization': f'Bearer {ACCESS_TOKEN}'}
|
||
)
|
||
playlist_items = response.json()
|
||
|
||
# Create table for playlist items
|
||
c.execute('''
|
||
CREATE TABLE IF NOT EXISTS playlist_items (
|
||
id TEXT PRIMARY KEY,
|
||
playlist_id TEXT,
|
||
title TEXT,
|
||
description TEXT,
|
||
published_at TEXT,
|
||
video_id TEXT
|
||
)
|
||
''')
|
||
|
||
# Insert playlist items into the database
|
||
for item in playlist_items['items']:
|
||
c.execute('''
|
||
INSERT OR REPLACE INTO playlist_items (id, playlist_id, title, description, published_at, video_id)
|
||
VALUES (?, ?, ?, ?, ?, ?)
|
||
''', (item['id'], playlist_id, item['snippet']['title'], item['snippet']['description'], item['snippet']['publishedAt'], item['snippet']['resourceId']['videoId']))
|
||
|
||
# Commit and close the connection
|
||
conn.commit()
|
||
conn.close()
|
||
|
||
print("Playlist items have been successfully saved to the database.")
|
||
```
|
||
|
||
### Stage 4: Analyzing Data
|
||
|
||
You can now analyze the data using SQL queries directly on the SQLite database or by loading the data into a pandas DataFrame for more complex analysis and visualization.
|
||
|
||
```python
|
||
import sqlite3
|
||
import pandas as pd
|
||
import matplotlib.pyplot as plt
|
||
|
||
# Connect to SQLite database
|
||
conn = sqlite3.connect('youtube_music.db')
|
||
|
||
# Load playlists into DataFrame
|
||
playlists_df = pd.read_sql_query("SELECT * FROM playlists", conn)
|
||
print(playlists_df.head())
|
||
|
||
# Load playlist items into DataFrame
|
||
playlist_items_df = pd.read_sql_query("SELECT * FROM playlist_items", conn)
|
||
print(playlist_items_df.head())
|
||
|
||
# Visualization Example
|
||
playlist_items_df['title'].value_counts().plot(kind='bar', figsize=(10, 5))
|
||
plt.title('Playlist Items by Title')
|
||
plt.xlabel('Title')
|
||
plt.ylabel('Count')
|
||
plt.show()
|
||
|
||
# Close the connection
|
||
conn.close()
|
||
```
|
||
|
||
This staged approach will help you connect to the YouTube Data API, fetch playlist data, store it in a SQLite database, and perform data analysis.
|
||
|
||
---
|
||
|
||
# YouTube Music Data Analysis
|
||
|
||
## Setup
|
||
|
||
```python
|
||
from ytmusicapi import YTMusic
|
||
import pandas as pd
|
||
import matplotlib.pyplot as plt
|
||
```
|
||
|
||
# Initialize YTMusic with OAuth credentials
|
||
```python
|
||
ytmusic = YTMusic('oauth.json')
|
||
```
|
||
|
||
## Fetch Data
|
||
|
||
### Liked Songs
|
||
|
||
```python
|
||
liked_songs = ytmusic.get_liked_songs(limit=100)
|
||
liked_songs_df = pd.DataFrame(liked_songs['tracks'])
|
||
liked_songs_df['artists'] = liked_songs_df['artists'].apply(lambda x: x[0]['name'] if x else None)
|
||
liked_songs_df.head()
|
||
```
|
||
|
||
### Playlists
|
||
|
||
```python
|
||
playlists = ytmusic.get_library_playlists(limit=25)
|
||
playlists_df = pd.DataFrame(playlists)
|
||
playlists_df.head()
|
||
```
|
||
|
||
### History
|
||
|
||
```python
|
||
history = ytmusic.get_history()
|
||
history_df = pd.DataFrame(history)
|
||
history_df.head()
|
||
```
|
||
|
||
## Data Visualization
|
||
|
||
### Liked Songs by Artist
|
||
|
||
```python
|
||
liked_songs_df['artists'].value_counts().plot(kind='bar', figsize=(10, 5))
|
||
plt.title('Liked Songs by Artist')
|
||
plt.xlabel('Artist')
|
||
plt.ylabel('Number of Liked Songs')
|
||
plt.show()
|
||
```
|
||
|
||
### History by Title
|
||
|
||
```python
|
||
history_df['title'].value_counts().plot(kind='bar', figsize=(10, 5))
|
||
plt.title('History by Title')
|
||
plt.xlabel('Title')
|
||
plt.ylabel('Number of Plays')
|
||
plt.show()
|
||
```
|
||
|
||
## Save Data to CSV
|
||
|
||
```python
|
||
liked_songs_df.to_csv('liked_songs.csv', index=False)
|
||
playlists_df.to_csv('playlists.csv', index=False)
|
||
history_df.to_csv('history.csv', index=False)
|
||
```
|
||
```
|
||
|
||
### Full Script Breakdown
|
||
|
||
1. **Setup:**
|
||
- Import necessary libraries (`ytmusicapi`, `pandas`, `matplotlib`).
|
||
- Initialize the YTMusic API with OAuth credentials.
|
||
|
||
2. **Fetch Data:**
|
||
- Get the user's liked songs and convert them to a DataFrame.
|
||
- Get the user's playlists and convert them to a DataFrame.
|
||
- Get the user's history and convert it to a DataFrame.
|
||
|
||
3. **Data Visualization:**
|
||
- Visualize the liked songs by artist using a bar chart.
|
||
- Visualize the history by title using a bar chart.
|
||
|
||
4. **Save Data to CSV:**
|
||
- Save the processed DataFrames to CSV files for further analysis or backup.
|
||
|
||
### How to Use This Notebook
|
||
|
||
1. **Ensure you have the `oauth.json` file in your project directory, which contains your OAuth credentials for the YTMusic API.**
|
||
2. **Start Jupyter Notebook:**
|
||
```bash
|
||
jupyter notebook
|
||
```
|
||
3. **Create a new notebook or open an existing one and copy the above cells into the notebook.**
|
||
4. **Run the cells step by step to fetch, analyze, visualize, and save your YouTube Music data.**
|
||
|
||
This setup will provide you with a comprehensive and interactive data analysis report of your YouTube Music telemetry.
|
||
|
||
---
|
||
|
||
### Step 1: Set Up Your Python Virtual Environment
|
||
|
||
First, ensure you have Python installed on your system. I recommend using Python 3.7 or newer. Here’s how you can set up a virtual environment:
|
||
|
||
1. **Create a New Directory for Your Project (Optional):**
|
||
```bash
|
||
mkdir yt-music-project
|
||
cd yt-music-project
|
||
```
|
||
|
||
2. **Create a Virtual Environment:**
|
||
```bash
|
||
python -m venv venv
|
||
```
|
||
|
||
3. **Activate the Virtual Environment:**
|
||
- On Windows:
|
||
```bash
|
||
.\venv\Scripts\activate
|
||
```
|
||
- On macOS and Linux:
|
||
```bash
|
||
source venv/bin/activate
|
||
```
|
||
|
||
### Step 2: Install Required Packages
|
||
|
||
1. **Ensure your `requirements.txt` includes `ytmusicapi`:**
|
||
You can create a `requirements.txt` file containing at least:
|
||
```
|
||
ytmusicapi
|
||
```
|
||
If you already have a `requirements.txt`, make sure `ytmusicapi` is listed.
|
||
|
||
2. **Install the Required Packages:**
|
||
```bash
|
||
pip install -r requirements.txt
|
||
```
|
||
|
||
### Step 3: Set Up OAuth Authentication
|
||
|
||
1. **Run OAuth Setup:**
|
||
While in your activated virtual environment and your project directory:
|
||
```bash
|
||
ytmusicapi oauth
|
||
```
|
||
Follow the on-screen instructions:
|
||
- Visit the URL provided in the command output.
|
||
- Log in with your Google account.
|
||
- Authorize the application if prompted.
|
||
- Copy the provided code back into the terminal.
|
||
|
||
This will generate an `oauth.json` file in your project directory containing the necessary credentials.
|
||
|
||
### Step 4: Initialize YTMusic with OAuth Credentials
|
||
|
||
1. **Create a Python Script:**
|
||
You can create a Python script like `main.py` to start coding with the API:
|
||
```python
|
||
from ytmusicapi import YTMusic
|
||
ytmusic = YTMusic('oauth.json')
|
||
```
|
||
|
||
### Step 5: Test by Creating a Playlist
|
||
|
||
1. **Write Code to Create a Playlist and Search for Music:**
|
||
Add to your `main.py`:
|
||
```python
|
||
# Create a new playlist
|
||
playlist_id = ytmusic.create_playlist("My Awesome Playlist", "A description of my playlist.")
|
||
|
||
# Search for a song
|
||
search_results = ytmusic.search("Oasis Wonderwall")
|
||
|
||
# Add the first search result to the new playlist
|
||
if search_results:
|
||
ytmusic.add_playlist_items(playlist_id, [search_results[0]['videoId']])
|
||
```
|
||
|
||
2. **Run Your Script:**
|
||
```bash
|
||
python main.py
|
||
```
|
||
|
||
This setup gives you a complete environment to work with the YTMusic API securely and manage your YouTube music data programmatically. You can extend this setup by adding more features, such as handling errors, enhancing functionality, or integrating with other data sources and tools for analysis or backup. |