4.7 KiB
4.7 KiB
As a Linux expert performing system analysis, I follow a structured approach to diagnose issues, optimize performance, and ensure system stability. Here’s my step-by-step methodology:
1. Define the Scope
- Identify the problem: Is it performance-related, stability, security, or functionality?
- Gather symptoms: Error messages, logs, user reports, or observable behavior (e.g., slow response, crashes).
2. Gather System Overview
- Hardware:
lscpu # CPU info free -h # Memory usage lsblk # Disk layout lspci/lsusb # Hardware devices dmidecode # Detailed hardware info (requires root) - OS/Kernel:
uname -a # Kernel version cat /etc/os-release # Distro info hostnamectl # System state (static/transient)
3. Check Real-Time System State
- Resource Monitoring:
top -c # Interactive process view (CPU/MEM) htop # Enhanced top (install if needed) vmstat 1 # System activity, memory, CPU, I/O iostat -xz 1 # Disk I/O stats dstat # Combined stats (CPU, disk, network) nload/iftop # Network traffic monitoring - Process Analysis:
ps auxf # Process tree pstree # Visual hierarchy pidstat 1 # Per-process CPU/memory/disk stats
4. Inspect Logs
- System Logs:
journalctl -xe --no-pager -n 50 # Systemd logs (most recent) tail -f /var/log/syslog # General logs (Debian) tail -f /var/log/messages # General logs (RHEL) - Service-Specific Logs:
Check/var/log/fornginx/,apache2/,mysql/, etc. - Kernel/Errors:
dmesg -T | tail -50 # Kernel ring buffer (time-stamped) grep -i error /var/log/* # Search for errors
5. Disk and Filesystem Analysis
- Space Usage:
df -hT # Filesystem space du -sh /* # Directory sizes (root) ncdu # Interactive disk usage (installable) - I/O Performance:
iotop -o # Disk I/O by process sar -d 1 # Historical disk stats (sysstat package) - Filesystem Health:
fsck # Filesystem check (unmount first) smartctl -a /dev/sda # SMART data for disks
6. Network Analysis
- Connections:
ss -tulnp # Sockets (replaces netstat) netstat -tuln # Legacy socket info lsof -i # Open network connections - Performance:
ping # Latency check traceroute/mtr # Route analysis iperf3 # Bandwidth test ethtool <interface> # NIC settings - Firewall/Routing:
iptables -L -n -v # Firewall rules nft list ruleset # For nftables ip route # Routing table
7. Performance Profiling
- CPU:
perf top # CPU profiling (install linux-tools) mpstat -P ALL 1 # Per-CPU stats - Memory:
cat /proc/meminfo # Detailed memory stats slabtop # Kernel slab cache usage - Bottlenecks:
strace -p <PID> # System calls of a process ltrace # Library calls tcpdump # Packet capture
8. Security Checks
- User/Sessions:
who -a # Logged-in users last # Login history sudo -l # User’s sudo privileges - Audit:
auditd # Audit framework (if enabled) chkrootkit/rkhunter # Rootkit scans - SUID/SGID Files:
find / -perm -4000 -type f 2>/dev/null # SUID files
9. Configuration Review
- Critical Files:
cat /etc/sysctl.conf # Kernel parameters cat /etc/security/limits.conf # User limits systemctl list-unit-files --state=enabled # Enabled services - Cron/At Jobs:
crontab -l -u root # Root’s cron jobs ls /etc/cron.* # System cron
10. Reproduce and Test
- Stress Testing:
stress --cpu 4 --io 2 --vm 2 --vm-bytes 1G # Simulate load - Benchmark:
sysbench # CPU/memory/disk benchmarks
11. Document and Resolve
- Summarize findings (e.g., "High I/O wait due to MySQL").
- Propose fixes (e.g., optimize queries, add RAM, adjust swappiness).
- Implement changes incrementally and monitor impact.
Key Tools to Install if Missing
apt install sysstat htop iotop nload iftop perf dstat ltrace strace
This approach ensures thorough analysis while minimizing disruption to live systems. Adjust steps based on the specific issue (e.g., focus on journalctl for boot failures or sar for historical trends).