Files
the_information_nexus/projects/oci_project.md
2025-06-26 13:39:26 +00:00

14 KiB

This is where the fun begins for a Linux power user! Provisioning your OCI Always Free home lab involves setting up the foundational network, the core compute instance, the database, and then integrating them. We'll prioritize CLI/SSH for a power-user feel, but mention Console options.

Here's a step-by-step guide for provisioning your OCI build:


While you're a Linux power user, setting up the VCN and initial user can be most straightforward via the OCI Console for the first pass, then manage via CLI.

  • Purpose: Organize your resources and isolate them. Good practice for even a home lab.
  • How to provision:
    • OCI Console: Identity & Security > Compartments > Create Compartment.
    • Name: homelab (or data_lab, ml_dev, etc.)
    • Description: "My OCI Always Free Home Lab Resources"

2. Virtual Cloud Network (VCN)

  • Purpose: Your private, isolated network in OCI. All your resources will live within this.
  • How to provision:
    • OCI Console: Networking > Virtual Cloud Networks.
    • Click Start VCN Wizard > VCN with Internet Connectivity.
    • Name: homelab-vcn
    • Compartment: Select your homelab compartment.
    • Leave defaults for CIDR blocks and subnets (a public and private subnet will be created, along with an Internet Gateway and NAT Gateway). This is generally fine for a basic home lab.
    • Crucial: Ensure "Enable Public Subnet" and "Enable Private Subnet" are checked.
    • Click Create VCN.

3. Generate/Upload SSH Key Pair

  • Purpose: Securely connect to your Linux instances. You'll need this before launching the instance.
  • How to provision:
    • On your local machine (Linux/macOS):
      mkdir ~/.oci_keys
      ssh-keygen -b 2048 -t rsa -f ~/.oci_keys/oci_homelab_key
      chmod 400 ~/.oci_keys/oci_homelab_key # Restrict permissions
      
    • This creates oci_homelab_key (private key) and oci_homelab_key.pub (public key). You'll upload the .pub file to OCI.

Phase 2: Core Infrastructure Deployment (CLI/Console Hybrid)

Now for the main components.

4. Arm Compute Instance (Your Linux Workstation)

  • Purpose: Your primary environment for data analysis, ML, and Linux power-user tasks.
  • How to provision (Console - easiest for first time, then manage with CLI):
    1. OCI Console: Compute > Instances > Create Instance.
    2. Name: data-science-vm
    3. Compartment: Select your homelab compartment.
    4. Placement: Keep default Availability Domain. Ensure "Always Free Eligible" is selected if it appears as an option.
    5. Image and Shape:
      • Click Edit for Image. Choose a Platform Image. I highly recommend Ubuntu (e.g., Ubuntu 22.04 or 24.04 if available) or Oracle Linux for best ARM compatibility.
      • Click Edit for Shape. Select Virtual Machine. Under Shape series, choose Ampere Arm-based. Select VM.Standard.A1.Flex. Use the sliders to allocate 4 OCPUs and 24 GB Memory. This consumes your full Always Free quota for A1 Flex.
    6. Networking:
      • Select the Virtual cloud network you created (homelab-vcn).
      • Select the Public Subnet you created (e.g., homelab-vcn-public-subnet).
      • Ensure Assign a public IPv4 address is checked.
    7. SSH Keys: Choose Upload public key files and browse to your ~/.oci_keys/oci_homelab_key.pub file.
    8. Boot Volume: Keep default (50GB). You can attach more Block Storage later if needed.
    9. Click Create. Wait for the instance to transition to Running state. Note its Public IP address.

5. Connect to Your Arm Instance

  • On your local machine:
    ssh -i ~/.oci_keys/oci_homelab_key ubuntu@<YOUR_VM_PUBLIC_IP>
    # (or opc@<YOUR_VM_PUBLIC_IP> if you chose Oracle Linux)
    
  • First login: Update packages: sudo apt update && sudo apt upgrade -y (for Ubuntu)

6. Oracle Autonomous Database (ATP/ADW)

  • Purpose: Your managed SQL database for data analysis.
  • How to provision (Console Recommended):
    1. OCI Console: Oracle Database > Autonomous Database > Create Autonomous Database.
    2. Compartment: Select your homelab compartment.
    3. Display name: my-data-warehouse (or my-data-lakehouse)
    4. Database name: DATALAB (can be different from display name, alphanumeric only)
    5. Workload type: For data analysis, choose Data Warehouse (ADW). If you primarily do transactional work, choose Transaction Processing (ATP).
    6. Deployment type: Serverless (this is the Always Free option).
    7. Always Free: Ensure the "Always Free" checkbox is checked. This limits ECPU to 0 (shared) and Storage to 20GB.
    8. Administrator Credentials: Set a strong ADMIN password. Remember this!
    9. Networking:
      • Virtual Cloud Network: Choose homelab-vcn.
      • Subnet: Choose your homelab-vcn-private-subnet.
      • Secure Access from Anywhere (recommended for home lab initially): Select Secure access from everywhere for simplicity, but for production, strongly consider Secure access from specified IP addresses and VCNs and whitelist your A1 instance's private IP.
    10. License Type: License Included.
    11. Click Create Autonomous Database. Wait for it to become Available.

7. Download Autonomous Database Wallet

  • Purpose: Securely connect your tools (Jupyter, RStudio, SQL clients) to the Autonomous Database.
  • How to provision:
    1. In the OCI Console, navigate to your my-data-warehouse Autonomous Database details page.
    2. Click Database Connection.
    3. Click Download Wallet.
    4. Choose a strong Wallet Password. Remember this!
    5. Save the Wallet_my-data-warehouse.zip file to your local machine. You'll transfer this to your A1 instance.

8. Object Storage Bucket

  • Purpose: Store raw data, scripts, ML models, and backups.
  • How to provision:
    1. OCI Console: Storage > Object Storage & Archive Storage > Buckets.
    2. Click Create Bucket.
    3. Bucket Name: homelab-data-bucket (unique across OCI tenancies).
    4. Compartment: Select your homelab compartment.
    5. Standard: Keep Standard storage tier (this is part of the 20GB Always Free).
    6. Keep other defaults. Click Create.

9. Security List Rules (for your Public Subnet)

  • Purpose: Control inbound/outbound traffic to your A1 instance. By default, SSH (port 22) is usually open. You might need more for web interfaces.
  • How to provision (CLI or Console):
    • Identify your Public Subnet: In the OCI Console, go to Networking > Virtual Cloud Networks, click your homelab-vcn, then Subnets, and find your homelab-vcn-public-subnet. Click on its name, then click on the Default Security List associated with it.
    • Add Ingress Rules (if needed):
      • If you plan to run a web server (e.g., JupyterHub, RStudio Server, a simple dashboard) on your A1 instance:
        • Click Add Ingress Rules.
        • Source CIDR: 0.0.0.0/0 (allow from anywhere, be cautious in production). For home lab, you might restrict to your home IP if it's static.
        • Destination Port Range: 8888 (for Jupyter), 8787 (for RStudio Server), 80 (HTTP), 443 (HTTPS) as needed.
        • Protocol: TCP
        • Description: "Allow Jupyter/RStudio access"
      • For ICMP (ping): (Optional, for troubleshooting connectivity)
        • Source CIDR: 0.0.0.0/0
        • IP Protocol: ICMP
        • Type: 8 (Echo Request)
        • Code: 0

Phase 3: Software Setup on Your A1 Instance (Linux Power User Focus)

This is where you bring your Linux skills to bear.

1. Transfer Files to A1 Instance

  • Upload ADB Wallet: scp -i ~/.oci_keys/oci_homelab_key ~/Downloads/Wallet_my-data-warehouse.zip ubuntu@<YOUR_VM_PUBLIC_IP>:/home/ubuntu/
  • Upload other scripts/data as needed.

2. Setup Python & Data Science Environment

  • Install Miniconda/Anaconda:
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh # For ARM
    bash Miniconda3-latest-Linux-aarch64.sh
    source ~/.bashrc # or restart terminal
    conda create -n datascience python=3.10 numpy pandas matplotlib scikit-learn jupyterlab ipython -y
    conda activate datascience
    
  • Install python-oracledb (for ADB connectivity):
    pip install python-oracledb
    
  • Unzip ADB Wallet:
    sudo apt install unzip # if not installed
    unzip Wallet_my-data-warehouse.zip -d /home/ubuntu/oracle_wallet # Extract to a dedicated dir
    
    • Configure TNS_ADMIN: Add this to your ~/.bashrc (or environment variables for your Jupyter/RStudio startup script):
      export TNS_ADMIN=/home/ubuntu/oracle_wallet
      
      source ~/.bashrc
    • Note: You may need to install Oracle Instant Client if python-oracledb requires it for full functionality, though for basic connections, it might work without it. Follow Oracle's documentation if you encounter issues.

3. Setup R and RStudio Server

  • Install R:
    sudo apt update
    sudo apt install r-base r-base-dev -y
    
  • Install RStudio Server (ARM64):
    • RStudio (Posit) provides official ARM64 daily builds for Ubuntu. Check their daily builds page (e.g., dailies.rstudio.com/rstudio/latest/) and look for arm64.deb packages for your Ubuntu version.
    • Example (adjust version as needed):
      wget https://download2.rstudio.org/server/debian11/amd64/rstudio-server-202X.XX.X-XXX-amd64.deb # Find the correct ARM64 link
      # IMPORTANT: MAKE SURE IT'S ARM64/AARCH64! The example above is x86. Search their daily builds for 'arm64.deb'.
      # Example for Ubuntu 22.04 ARM64
      wget https://cdn.posit.co/rstudio/rstudio-desktop/2024.04.1+748/rstudio-2024.04.1-748-arm64.deb # This is Desktop, you want Server
      # Search for RStudio Server ARM64 .deb
      # If no official direct .deb, you might need to compile or find community builds.
      # Check this GitHub for potential script-based installation if direct .deb is tricky:
      # https://github.com/jrowen/ARM-rstudio-server
      
      # Assuming you find a correct ARM64 .deb:
      sudo dpkg -i rstudio-server-202X.XX.X-XXX-arm64.deb
      sudo apt install -f # Fix any dependency issues
      
    • Configure RStudio Server: Create a user account if you don't want to use ubuntu user for RStudio Server login.
    • Access RStudio Server: http://<YOUR_VM_PUBLIC_IP>:8787 (after opening port 8787 in security list).

4. JupyterLab Setup

  • Run JupyterLab (from your A1 instance's terminal):
    conda activate datascience
    jupyter lab --no-browser --port=8888 --ip=0.0.0.0 --allow-root # --allow-root if needed, but safer with dedicated user
    
  • Access JupyterLab: http://<YOUR_VM_PUBLIC_IP>:8888 (after opening port 8888 in security list).
  • For persistent/background Jupyter: Consider screen or tmux, or even systemd services.
    • sudo apt install screen -y
    • screen -S jupyter
    • Then run the jupyter lab command inside the screen session. Detach with Ctrl+A, D. Reattach with screen -r jupyter.

Phase 4: Optimization & Monitoring

1. OCI Console Dashboards

  • Purpose: Visualize the health and usage of your OCI resources.
  • How to provision:
    1. OCI Console: Observability & Management > Console Dashboards.
    2. Create Dashboard.
    3. Add widgets for:
      • Compute: CPU utilization, Memory utilization (if agent installed), Network I/O for your data-science-vm.
      • Autonomous Database: CPU utilization, Active Sessions, Storage Used.
      • Object Storage: Bucket size, PUT/GET requests.
    4. Set up Alarms (Observability & Management > Monitoring > Alarm Definitions) for CPU usage on your A1 instance nearing 100%, or Autonomous Database storage nearing limits, to avoid issues.

2. OCI Vault (for API Keys & Credentials)

  • Purpose: Securely store sensitive credentials (e.g., API keys for other services, if you expand your lab).
  • How to provision (Console):
    1. OCI Console: Identity & Security > Vault.
    2. Create Vault.
    3. Inside your vault, create Keys (for encryption) and then Secrets (your actual credentials).
    4. Access these secrets from your A1 instance using the OCI SDK (Python SDK is excellent for this).

3. Network Security Groups (NSGs) (Advanced)

  • Purpose: More granular network control than Security Lists, allowing you to apply rules directly to VNICs, not subnets.
  • Why use it: If you add more instances or diversify services within the same subnet, NSGs give you more precise control.
  • How to provision:
    1. OCI Console: Networking > Virtual Cloud Networks > homelab-vcn > Network Security Groups.
    2. Create an NSG (e.g., data-science-nsg). Add ingress rules specific to your A1 instance (SSH, Jupyter, RStudio).
    3. When you create/edit your A1 instance, assign it to this NSG. This will override/supplement the subnet's security list.

This detailed plan focuses on setting up your core OCI environment for data science and Linux power usage. Remember to keep all credentials (SSH keys, ADB passwords, Wallet passwords) extremely secure. Have fun building out your powerful free home lab!