Files
the_information_nexus/tech_docs/linux/linux_101.md

4.7 KiB
Raw Blame History

As a Linux expert performing system analysis, I follow a structured approach to diagnose issues, optimize performance, and ensure system stability. Heres my step-by-step methodology:


1. Define the Scope

  • Identify the problem: Is it performance-related, stability, security, or functionality?
  • Gather symptoms: Error messages, logs, user reports, or observable behavior (e.g., slow response, crashes).

2. Gather System Overview

  • Hardware:
    lscpu          # CPU info
    free -h        # Memory usage
    lsblk          # Disk layout
    lspci/lsusb    # Hardware devices
    dmidecode      # Detailed hardware info (requires root)
    
  • OS/Kernel:
    uname -a       # Kernel version
    cat /etc/os-release  # Distro info
    hostnamectl    # System state (static/transient)
    

3. Check Real-Time System State

  • Resource Monitoring:
    top -c         # Interactive process view (CPU/MEM)
    htop           # Enhanced top (install if needed)
    vmstat 1       # System activity, memory, CPU, I/O
    iostat -xz 1   # Disk I/O stats
    dstat          # Combined stats (CPU, disk, network)
    nload/iftop    # Network traffic monitoring
    
  • Process Analysis:
    ps auxf        # Process tree
    pstree         # Visual hierarchy
    pidstat 1      # Per-process CPU/memory/disk stats
    

4. Inspect Logs

  • System Logs:
    journalctl -xe --no-pager -n 50  # Systemd logs (most recent)
    tail -f /var/log/syslog          # General logs (Debian)
    tail -f /var/log/messages        # General logs (RHEL)
    
  • Service-Specific Logs:
    Check /var/log/ for nginx/, apache2/, mysql/, etc.
  • Kernel/Errors:
    dmesg -T | tail -50  # Kernel ring buffer (time-stamped)
    grep -i error /var/log/*  # Search for errors
    

5. Disk and Filesystem Analysis

  • Space Usage:
    df -hT         # Filesystem space
    du -sh /*      # Directory sizes (root)
    ncdu           # Interactive disk usage (installable)
    
  • I/O Performance:
    iotop -o       # Disk I/O by process
    sar -d 1       # Historical disk stats (sysstat package)
    
  • Filesystem Health:
    fsck           # Filesystem check (unmount first)
    smartctl -a /dev/sda  # SMART data for disks
    

6. Network Analysis

  • Connections:
    ss -tulnp      # Sockets (replaces netstat)
    netstat -tuln  # Legacy socket info
    lsof -i        # Open network connections
    
  • Performance:
    ping           # Latency check
    traceroute/mtr # Route analysis
    iperf3         # Bandwidth test
    ethtool <interface>  # NIC settings
    
  • Firewall/Routing:
    iptables -L -n -v  # Firewall rules
    nft list ruleset   # For nftables
    ip route           # Routing table
    

7. Performance Profiling

  • CPU:
    perf top       # CPU profiling (install linux-tools)
    mpstat -P ALL 1 # Per-CPU stats
    
  • Memory:
    cat /proc/meminfo  # Detailed memory stats
    slabtop        # Kernel slab cache usage
    
  • Bottlenecks:
    strace -p <PID>  # System calls of a process
    ltrace          # Library calls
    tcpdump         # Packet capture
    

8. Security Checks

  • User/Sessions:
    who -a          # Logged-in users
    last            # Login history
    sudo -l         # Users sudo privileges
    
  • Audit:
    auditd          # Audit framework (if enabled)
    chkrootkit/rkhunter  # Rootkit scans
    
  • SUID/SGID Files:
    find / -perm -4000 -type f 2>/dev/null  # SUID files
    

9. Configuration Review

  • Critical Files:
    cat /etc/sysctl.conf     # Kernel parameters
    cat /etc/security/limits.conf  # User limits
    systemctl list-unit-files --state=enabled  # Enabled services
    
  • Cron/At Jobs:
    crontab -l -u root  # Roots cron jobs
    ls /etc/cron.*       # System cron
    

10. Reproduce and Test

  • Stress Testing:
    stress --cpu 4 --io 2 --vm 2 --vm-bytes 1G  # Simulate load
    
  • Benchmark:
    sysbench      # CPU/memory/disk benchmarks
    

11. Document and Resolve

  • Summarize findings (e.g., "High I/O wait due to MySQL").
  • Propose fixes (e.g., optimize queries, add RAM, adjust swappiness).
  • Implement changes incrementally and monitor impact.

Key Tools to Install if Missing

apt install sysstat htop iotop nload iftop perf dstat ltrace strace

This approach ensures thorough analysis while minimizing disruption to live systems. Adjust steps based on the specific issue (e.g., focus on journalctl for boot failures or sar for historical trends).