the_information_nexus/linux_101.md at 116fcc1fec122f806309d010bb9363f8e4a5278a

Files

medusa 8d159aac0b Add tech_docs/linux/linux_101.md

2025-06-27 01:43:37 +00:00

4.7 KiB

Raw Blame History

As a Linux expert performing system analysis, I follow a structured approach to diagnose issues, optimize performance, and ensure system stability. Here’s my step-by-step methodology:

1. Define the Scope

Identify the problem: Is it performance-related, stability, security, or functionality?
Gather symptoms: Error messages, logs, user reports, or observable behavior (e.g., slow response, crashes).

2. Gather System Overview

Hardware:

lscpu          # CPU info
free -h        # Memory usage
lsblk          # Disk layout
lspci/lsusb    # Hardware devices
dmidecode      # Detailed hardware info (requires root)

OS/Kernel:

uname -a       # Kernel version
cat /etc/os-release  # Distro info
hostnamectl    # System state (static/transient)

3. Check Real-Time System State

Resource Monitoring:

top -c         # Interactive process view (CPU/MEM)
htop           # Enhanced top (install if needed)
vmstat 1       # System activity, memory, CPU, I/O
iostat -xz 1   # Disk I/O stats
dstat          # Combined stats (CPU, disk, network)
nload/iftop    # Network traffic monitoring

Process Analysis:

ps auxf        # Process tree
pstree         # Visual hierarchy
pidstat 1      # Per-process CPU/memory/disk stats

4. Inspect Logs

System Logs:

journalctl -xe --no-pager -n 50  # Systemd logs (most recent)
tail -f /var/log/syslog          # General logs (Debian)
tail -f /var/log/messages        # General logs (RHEL)

Service-Specific Logs:
Check /var/log/ for nginx/, apache2/, mysql/, etc.

Kernel/Errors:

dmesg -T | tail -50  # Kernel ring buffer (time-stamped)
grep -i error /var/log/*  # Search for errors

5. Disk and Filesystem Analysis

Space Usage:

df -hT         # Filesystem space
du -sh /*      # Directory sizes (root)
ncdu           # Interactive disk usage (installable)

I/O Performance:

iotop -o       # Disk I/O by process
sar -d 1       # Historical disk stats (sysstat package)

Filesystem Health:

fsck           # Filesystem check (unmount first)
smartctl -a /dev/sda  # SMART data for disks

6. Network Analysis

Connections:

ss -tulnp      # Sockets (replaces netstat)
netstat -tuln  # Legacy socket info
lsof -i        # Open network connections

Performance:

ping           # Latency check
traceroute/mtr # Route analysis
iperf3         # Bandwidth test
ethtool <interface>  # NIC settings

Firewall/Routing:

iptables -L -n -v  # Firewall rules
nft list ruleset   # For nftables
ip route           # Routing table

7. Performance Profiling

CPU:

perf top       # CPU profiling (install linux-tools)
mpstat -P ALL 1 # Per-CPU stats

Memory:

cat /proc/meminfo  # Detailed memory stats
slabtop        # Kernel slab cache usage

Bottlenecks:

strace -p <PID>  # System calls of a process
ltrace          # Library calls
tcpdump         # Packet capture

8. Security Checks

User/Sessions:

who -a          # Logged-in users
last            # Login history
sudo -l         # User’s sudo privileges

Audit:

auditd          # Audit framework (if enabled)
chkrootkit/rkhunter  # Rootkit scans

SUID/SGID Files:

find / -perm -4000 -type f 2>/dev/null  # SUID files

9. Configuration Review

Critical Files:

cat /etc/sysctl.conf     # Kernel parameters
cat /etc/security/limits.conf  # User limits
systemctl list-unit-files --state=enabled  # Enabled services

Cron/At Jobs:

crontab -l -u root  # Root’s cron jobs
ls /etc/cron.*       # System cron

10. Reproduce and Test

Stress Testing:

stress --cpu 4 --io 2 --vm 2 --vm-bytes 1G  # Simulate load

Benchmark:

sysbench      # CPU/memory/disk benchmarks

11. Document and Resolve

Summarize findings (e.g., "High I/O wait due to MySQL").
Propose fixes (e.g., optimize queries, add RAM, adjust swappiness).
Implement changes incrementally and monitor impact.

Key Tools to Install if Missing

apt install sysstat htop iotop nload iftop perf dstat ltrace strace

This approach ensures thorough analysis while minimizing disruption to live systems. Adjust steps based on the specific issue (e.g., focus on journalctl for boot failures or sar for historical trends).

4.7 KiB Raw Blame History Unescape Escape