Files
the_information_nexus/projects/oci_project.md
2025-06-26 13:39:26 +00:00

248 lines
14 KiB
Markdown

This is where the fun begins for a Linux power user\! Provisioning your OCI Always Free home lab involves setting up the foundational network, the core compute instance, the database, and then integrating them. We'll prioritize CLI/SSH for a power-user feel, but mention Console options.
Here's a step-by-step guide for provisioning your OCI build:
-----
## **Phase 1: Foundational Setup (OCI Console Recommended for First-Time)**
While you're a Linux power user, setting up the VCN and initial user can be most straightforward via the OCI Console for the first pass, then manage via CLI.
### **1. Compartment (Optional but Recommended)**
* **Purpose:** Organize your resources and isolate them. Good practice for even a home lab.
* **How to provision:**
* **OCI Console:** `Identity & Security` \> `Compartments` \> `Create Compartment`.
* **Name:** `homelab` (or `data_lab`, `ml_dev`, etc.)
* **Description:** "My OCI Always Free Home Lab Resources"
### **2. Virtual Cloud Network (VCN)**
* **Purpose:** Your private, isolated network in OCI. All your resources will live within this.
* **How to provision:**
* **OCI Console:** `Networking` \> `Virtual Cloud Networks`.
* Click `Start VCN Wizard` \> `VCN with Internet Connectivity`.
* **Name:** `homelab-vcn`
* **Compartment:** Select your `homelab` compartment.
* Leave defaults for CIDR blocks and subnets (a public and private subnet will be created, along with an Internet Gateway and NAT Gateway). This is generally fine for a basic home lab.
* **Crucial:** Ensure "Enable Public Subnet" and "Enable Private Subnet" are checked.
* Click `Create VCN`.
### **3. Generate/Upload SSH Key Pair**
* **Purpose:** Securely connect to your Linux instances. You'll need this *before* launching the instance.
* **How to provision:**
* **On your local machine (Linux/macOS):**
```bash
mkdir ~/.oci_keys
ssh-keygen -b 2048 -t rsa -f ~/.oci_keys/oci_homelab_key
chmod 400 ~/.oci_keys/oci_homelab_key # Restrict permissions
```
* This creates `oci_homelab_key` (private key) and `oci_homelab_key.pub` (public key). You'll upload the `.pub` file to OCI.
-----
## **Phase 2: Core Infrastructure Deployment (CLI/Console Hybrid)**
Now for the main components.
### **4. Arm Compute Instance (Your Linux Workstation)**
* **Purpose:** Your primary environment for data analysis, ML, and Linux power-user tasks.
* **How to provision (Console - easiest for first time, then manage with CLI):**
1. **OCI Console:** `Compute` \> `Instances` \> `Create Instance`.
2. **Name:** `data-science-vm`
3. **Compartment:** Select your `homelab` compartment.
4. **Placement:** Keep default Availability Domain. Ensure "Always Free Eligible" is selected if it appears as an option.
5. **Image and Shape:**
* Click `Edit` for **Image**. Choose a **Platform Image**. I highly recommend `Ubuntu` (e.g., `Ubuntu 22.04` or `24.04` if available) or `Oracle Linux` for best ARM compatibility.
* Click `Edit` for **Shape**. Select `Virtual Machine`. Under `Shape series`, choose `Ampere Arm-based`. Select `VM.Standard.A1.Flex`. Use the sliders to allocate **4 OCPUs and 24 GB Memory**. This consumes your full Always Free quota for A1 Flex.
6. **Networking:**
* Select the `Virtual cloud network` you created (`homelab-vcn`).
* Select the `Public Subnet` you created (e.g., `homelab-vcn-public-subnet`).
* Ensure `Assign a public IPv4 address` is **checked**.
7. **SSH Keys:** Choose `Upload public key files` and browse to your `~/.oci_keys/oci_homelab_key.pub` file.
8. **Boot Volume:** Keep default (50GB). You can attach more Block Storage later if needed.
9. Click `Create`. Wait for the instance to transition to `Running` state. Note its **Public IP address**.
### **5. Connect to Your Arm Instance**
* **On your local machine:**
```bash
ssh -i ~/.oci_keys/oci_homelab_key ubuntu@<YOUR_VM_PUBLIC_IP>
# (or opc@<YOUR_VM_PUBLIC_IP> if you chose Oracle Linux)
```
* **First login:** Update packages: `sudo apt update && sudo apt upgrade -y` (for Ubuntu)
### **6. Oracle Autonomous Database (ATP/ADW)**
* **Purpose:** Your managed SQL database for data analysis.
* **How to provision (Console Recommended):**
1. **OCI Console:** `Oracle Database` \> `Autonomous Database` \> `Create Autonomous Database`.
2. **Compartment:** Select your `homelab` compartment.
3. **Display name:** `my-data-warehouse` (or `my-data-lakehouse`)
4. **Database name:** `DATALAB` (can be different from display name, alphanumeric only)
5. **Workload type:** For data analysis, choose **`Data Warehouse` (ADW)**. If you primarily do transactional work, choose `Transaction Processing (ATP)`.
6. **Deployment type:** `Serverless` (this is the Always Free option).
7. **Always Free:** Ensure the "Always Free" checkbox is checked. This limits ECPU to 0 (shared) and Storage to 20GB.
8. **Administrator Credentials:** Set a strong `ADMIN` password. *Remember this\!*
9. **Networking:**
* **Virtual Cloud Network:** Choose `homelab-vcn`.
* **Subnet:** Choose your `homelab-vcn-private-subnet`.
* **Secure Access from Anywhere (recommended for home lab initially):** Select `Secure access from everywhere` for simplicity, but **for production, strongly consider `Secure access from specified IP addresses and VCNs`** and whitelist your A1 instance's private IP.
10. **License Type:** `License Included`.
11. Click `Create Autonomous Database`. Wait for it to become `Available`.
### **7. Download Autonomous Database Wallet**
* **Purpose:** Securely connect your tools (Jupyter, RStudio, SQL clients) to the Autonomous Database.
* **How to provision:**
1. In the OCI Console, navigate to your `my-data-warehouse` Autonomous Database details page.
2. Click `Database Connection`.
3. Click `Download Wallet`.
4. Choose a strong `Wallet Password`. *Remember this\!*
5. Save the `Wallet_my-data-warehouse.zip` file to your local machine. You'll transfer this to your A1 instance.
### **8. Object Storage Bucket**
* **Purpose:** Store raw data, scripts, ML models, and backups.
* **How to provision:**
1. **OCI Console:** `Storage` \> `Object Storage & Archive Storage` \> `Buckets`.
2. Click `Create Bucket`.
3. **Bucket Name:** `homelab-data-bucket` (unique across OCI tenancies).
4. **Compartment:** Select your `homelab` compartment.
5. **Standard:** Keep `Standard` storage tier (this is part of the 20GB Always Free).
6. Keep other defaults. Click `Create`.
### **9. Security List Rules (for your Public Subnet)**
* **Purpose:** Control inbound/outbound traffic to your A1 instance. By default, SSH (port 22) is usually open. You might need more for web interfaces.
* **How to provision (CLI or Console):**
* **Identify your Public Subnet:** In the OCI Console, go to `Networking` \> `Virtual Cloud Networks`, click your `homelab-vcn`, then `Subnets`, and find your `homelab-vcn-public-subnet`. Click on its name, then click on the **Default Security List** associated with it.
* **Add Ingress Rules (if needed):**
* If you plan to run a web server (e.g., JupyterHub, RStudio Server, a simple dashboard) on your A1 instance:
* Click `Add Ingress Rules`.
* **Source CIDR:** `0.0.0.0/0` (allow from anywhere, *be cautious in production*). For home lab, you might restrict to your home IP if it's static.
* **Destination Port Range:** `8888` (for Jupyter), `8787` (for RStudio Server), `80` (HTTP), `443` (HTTPS) as needed.
* **Protocol:** `TCP`
* **Description:** "Allow Jupyter/RStudio access"
* **For ICMP (ping):** (Optional, for troubleshooting connectivity)
* Source CIDR: `0.0.0.0/0`
* IP Protocol: `ICMP`
* Type: `8` (Echo Request)
* Code: `0`
-----
## **Phase 3: Software Setup on Your A1 Instance (Linux Power User Focus)**
This is where you bring your Linux skills to bear.
### **1. Transfer Files to A1 Instance**
* **Upload ADB Wallet:** `scp -i ~/.oci_keys/oci_homelab_key ~/Downloads/Wallet_my-data-warehouse.zip ubuntu@<YOUR_VM_PUBLIC_IP>:/home/ubuntu/`
* **Upload other scripts/data** as needed.
### **2. Setup Python & Data Science Environment**
* **Install Miniconda/Anaconda:**
```bash
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh # For ARM
bash Miniconda3-latest-Linux-aarch64.sh
source ~/.bashrc # or restart terminal
conda create -n datascience python=3.10 numpy pandas matplotlib scikit-learn jupyterlab ipython -y
conda activate datascience
```
* **Install `python-oracledb` (for ADB connectivity):**
```bash
pip install python-oracledb
```
* **Unzip ADB Wallet:**
```bash
sudo apt install unzip # if not installed
unzip Wallet_my-data-warehouse.zip -d /home/ubuntu/oracle_wallet # Extract to a dedicated dir
```
* **Configure `TNS_ADMIN`:** Add this to your `~/.bashrc` (or environment variables for your Jupyter/RStudio startup script):
```bash
export TNS_ADMIN=/home/ubuntu/oracle_wallet
```
`source ~/.bashrc`
* *Note:* You may need to install Oracle Instant Client if `python-oracledb` requires it for full functionality, though for basic connections, it might work without it. Follow Oracle's documentation if you encounter issues.
### **3. Setup R and RStudio Server**
* **Install R:**
```bash
sudo apt update
sudo apt install r-base r-base-dev -y
```
* **Install RStudio Server (ARM64):**
* RStudio (Posit) provides official ARM64 daily builds for Ubuntu. Check their daily builds page (e.g., `dailies.rstudio.com/rstudio/latest/`) and look for `arm64.deb` packages for your Ubuntu version.
* Example (adjust version as needed):
```bash
wget https://download2.rstudio.org/server/debian11/amd64/rstudio-server-202X.XX.X-XXX-amd64.deb # Find the correct ARM64 link
# IMPORTANT: MAKE SURE IT'S ARM64/AARCH64! The example above is x86. Search their daily builds for 'arm64.deb'.
# Example for Ubuntu 22.04 ARM64
wget https://cdn.posit.co/rstudio/rstudio-desktop/2024.04.1+748/rstudio-2024.04.1-748-arm64.deb # This is Desktop, you want Server
# Search for RStudio Server ARM64 .deb
# If no official direct .deb, you might need to compile or find community builds.
# Check this GitHub for potential script-based installation if direct .deb is tricky:
# https://github.com/jrowen/ARM-rstudio-server
# Assuming you find a correct ARM64 .deb:
sudo dpkg -i rstudio-server-202X.XX.X-XXX-arm64.deb
sudo apt install -f # Fix any dependency issues
```
* **Configure RStudio Server:** Create a user account if you don't want to use `ubuntu` user for RStudio Server login.
* Access RStudio Server: `http://<YOUR_VM_PUBLIC_IP>:8787` (after opening port 8787 in security list).
### **4. JupyterLab Setup**
* **Run JupyterLab (from your A1 instance's terminal):**
```bash
conda activate datascience
jupyter lab --no-browser --port=8888 --ip=0.0.0.0 --allow-root # --allow-root if needed, but safer with dedicated user
```
* **Access JupyterLab:** `http://<YOUR_VM_PUBLIC_IP>:8888` (after opening port 8888 in security list).
* **For persistent/background Jupyter:** Consider `screen` or `tmux`, or even `systemd` services.
* `sudo apt install screen -y`
* `screen -S jupyter`
* Then run the `jupyter lab` command inside the screen session. Detach with `Ctrl+A, D`. Reattach with `screen -r jupyter`.
-----
## **Phase 4: Optimization & Monitoring**
### **1. OCI Console Dashboards**
* **Purpose:** Visualize the health and usage of your OCI resources.
* **How to provision:**
1. **OCI Console:** `Observability & Management` \> `Console Dashboards`.
2. `Create Dashboard`.
3. Add widgets for:
* **Compute:** CPU utilization, Memory utilization (if agent installed), Network I/O for your `data-science-vm`.
* **Autonomous Database:** CPU utilization, Active Sessions, Storage Used.
* **Object Storage:** Bucket size, PUT/GET requests.
4. Set up **Alarms** (`Observability & Management` \> `Monitoring` \> `Alarm Definitions`) for CPU usage on your A1 instance nearing 100%, or Autonomous Database storage nearing limits, to avoid issues.
### **2. OCI Vault (for API Keys & Credentials)**
* **Purpose:** Securely store sensitive credentials (e.g., API keys for other services, if you expand your lab).
* **How to provision (Console):**
1. **OCI Console:** `Identity & Security` \> `Vault`.
2. `Create Vault`.
3. Inside your vault, create **Keys** (for encryption) and then **Secrets** (your actual credentials).
4. Access these secrets from your A1 instance using the OCI SDK (Python SDK is excellent for this).
### **3. Network Security Groups (NSGs) (Advanced)**
* **Purpose:** More granular network control than Security Lists, allowing you to apply rules directly to VNICs, not subnets.
* **Why use it:** If you add more instances or diversify services within the same subnet, NSGs give you more precise control.
* **How to provision:**
1. **OCI Console:** `Networking` \> `Virtual Cloud Networks` \> `homelab-vcn` \> `Network Security Groups`.
2. Create an NSG (e.g., `data-science-nsg`). Add ingress rules specific to your A1 instance (SSH, Jupyter, RStudio).
3. When you create/edit your A1 instance, assign it to this NSG. This will override/supplement the subnet's security list.
-----
This detailed plan focuses on setting up your core OCI environment for data science and Linux power usage. Remember to keep all credentials (SSH keys, ADB passwords, Wallet passwords) extremely secure. Have fun building out your powerful free home lab\!