openEuler欧拉操作系统–Nginx故障排查详解

admin 2026-03-03 08:11:03 网络安全文章 来源:ZONE.CI 全球网 0 阅读模式

文章总结: 本文详解openEuler系统Nginx故障排查,涵盖架构简介与故障优先级定义,提供自动化信息收集脚本。核心内容包括服务状态检查命令、配置文件语法排查及常见错误如端口冲突、权限问题的解决方案。文内附带状态监控与配置检查脚本,帮助运维人员快速定位启动失败与运行异常,具备高度可操作性。 综合评分: 90 文章分类: 应急响应,实战经验,安全运营,解决方案


cover_image

openEuler 欧拉操作系统 – Nginx 故障排查详解

原创

刘军军 刘军军

运维星火燎原

2026年2月22日 00:01 山西


目录

  1. Nginx 故障排查概述
  2. Nginx 服务状态检查
  3. Nginx 配置文件排查
  4. Nginx 日志分析
  5. 启动失败故障排查
  6. 运行异常故障排查
  7. 性能问题排查
  8. 网络连接问题排查
  9. SSL/TLS 问题排查
  10. 常见故障案例库
  11. 自动化排查脚本
  12. 故障排查流程图
  13. 最佳实践

一、Nginx 故障排查概述

1.1 Nginx 架构简介

┌─────────────────────────────────────────────────────────────────┐
│                         Nginx 架构                              │
├─────────────────────────────────────────────────────────────────┤
│  MasterProcess (主进程)                                         │
│  ├─ 读取配置文件                                                  │
│  ├─ 绑定端口                                                     │
│  └─ 管理 Worker 进程                                              │
│                                                                  │
│  WorkerProcess (工作进程) × N                                   │
│  ├─ 处理客户端请求                                                │
│  ├─ 读取/写入磁盘                                                 │
│  └─ 与上游服务器通信                                              │
│                                                                  │
│  CacheProcess (缓存进程) [可选]                                  │
│  └─ 管理缓存文件                                                  │
└─────────────────────────────────────────────────────────────────┘

1.2 故障排查优先级

| | | | | | — | — | — | — | | 优先级 | 故障类型 | 响应时间 | 示例 | | P0 | 服务完全不可用 | 立即 | Nginx 无法启动、端口不通 | | P1 | 服务部分不可用 | 15 分钟 | 502/503/504 错误、响应慢 | | P2 | 性能问题 | 1 小时 | CPU/内存占用高、连接数多 | | P3 | 配置警告 | 24 小时 | 日志警告、配置优化 |

1.3 故障信息收集清单

#!/bin/bash
# Nginx 故障信息收集脚本

COLLECT_DIR="/tmp/nginx_fault_$(date +%Y%m%d_%H%M%S)"
mkdir -p $COLLECT_DIR

echo"开始收集 Nginx 故障信息..."

# 1. 服务状态
systemctl status nginx > $COLLECT_DIR/service_status.txt 2>&1

# 2. 进程信息
ps aux | grep nginx > $COLLECT_DIR/process_info.txt 2>&1
pstree -p | grep nginx >> $COLLECT_DIR/process_info.txt 2>&1

# 3. 配置文件
nginx -T > $COLLECT_DIR/nginx_config.txt 2>&1
cp -r /etc/nginx $COLLECT_DIR/nginx_config_backup/ 2>/dev/null

# 4. 日志文件
cp /var/log/nginx/*.log$COLLECT_DIR/ 2>/dev/null
journalctl -u nginx --since "2 hours ago" > $COLLECT_DIR/journal_log.txt 2>&1

# 5. 网络状态
ss -tulpn | grep nginx > $COLLECT_DIR/network_status.txt 2>&1
netstat -an | grep :80 >> $COLLECT_DIR/network_status.txt 2>&1

# 6. 系统资源
free -h > $COLLECT_DIR/system_resource.txt 2>&1
df -h >> $COLLECT_DIR/system_resource.txt 2>&1
uptime >> $COLLECT_DIR/system_resource.txt 2>&1

# 7. 连接统计
echo"=== 当前连接数 ===" > $COLLECT_DIR/connection_stats.txt 2>&1
ss -tn | wc -l >> $COLLECT_DIR/connection_stats.txt 2>&1
echo"=== TIME_WAIT 连接 ===" >> $COLLECT_DIR/connection_stats.txt 2>&1
ss -tn state time-wait | wc -l >> $COLLECT_DIR/connection_stats.txt 2>&1

echo "信息收集完成:$COLLECT_DIR"

二、Nginx 服务状态检查

2.1 服务状态检查命令

# ========== 基础状态检查 ==========
# 查看服务状态
systemctl status nginx

# 输出详解
$ systemctl status nginx
● nginx.service - The nginx HTTP and reverse proxy server
   Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2024-01-15 10:00:00 CST; 5 days ago
     Docs: man:nginx(8)
  Process: 1234 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=0/SUCCESS)
  Process: 1235 ExecStart=/usr/sbin/nginx (code=exited, status=0/SUCCESS)
 Main PID: 1236 (nginx)
    Tasks: 5 (limit: 4915)
   Memory: 10.5M
      CPU: 1.234s
   CGroup: /system.slice/nginx.service
           ├─1236 nginx: master process /usr/sbin/nginx -g daemon off;
           ├─1237 nginx: worker process
           ├─1238 nginx: worker process
           ├─1239 nginx: worker process
           └─1240 nginx: worker process

# 检查服务是否运行
systemctl is-active nginx
echo $?  # 0=active, 3=inactive

# 检查服务是否启用
systemctl is-enabled nginx
echo $?  # 0=enabled, 1=disabled

# ========== 进程检查 ==========
# 查看 Nginx 进程
ps aux | grep nginx

# 查看进程树
pstree -p | grep nginx

# 查看主进程 PID
cat /run/nginx.pid
systemctl show nginx -p MainPID

# 查看 Worker 进程数
ps aux | grep "nginx: worker" | wc -l

# ========== 端口检查 ==========
# 查看监听端口
ss -tulpn | grep nginx
netstat -tulpn | grep nginx

# 查看特定端口
ss -tulpn | grep :80
ss -tulpn | grep :443

# 查看连接统计
ss -tn | grep :80 | wc -l
ss -tn state established | grep :80 | wc -l
ss -tn state time-wait | grep :80 | wc -l

2.2 服务状态快速诊断表

| | | | | | — | — | — | — | | 状态 | 命令输出 | 含义 | 解决方案 | | active (running) | Active: active (running) | 正常运行 | 无需处理 | | inactive (dead) | Active: inactive (dead) | 服务未运行 | systemctl start nginx | | failed | Active: failed (Result: exit-code) | 启动失败 | 查看日志排查 | | activating | Active: activating (start) | 启动中 | 等待或检查超时 | | masked | Unit nginx.service is masked | 服务被屏蔽 | systemctl unmask nginx |

2.3 服务状态监控脚本

#!/bin/bash
#===============================================================================
# 脚本名称:nginx_status_monitor.sh
# 功能描述:Nginx 服务状态监控
#===============================================================================

LOG_FILE="/var/log/nginx_status_monitor.log"
ALERT_EMAIL="[email protected]"

log() {
    echo"[$(date '+%Y-%m-%d %H:%M:%S')] $1" | tee -a $LOG_FILE
}

send_alert() {
    local subject=$1
    local message=$2
    echo"$message" | mail -s "$subject"$ALERT_EMAIL 2>/dev/null
    log"⚠️  告警已发送:$subject"
}

check_nginx_status() {
    local status=$(systemctl is-active nginx 2>/dev/null)

    if [ "$status" != "active" ]; then
        log"⚠️  告警:Nginx 服务状态异常!当前状态:$status"
        send_alert "Nginx 服务告警""Nginx 服务状态异常,当前状态:$status"

        # 尝试自动重启
        log"尝试自动重启 Nginx..."
        systemctl restart nginx

        if [ $? -eq 0 ]; then
            log"✓ Nginx 重启成功"
        else
            log"❌ Nginx 重启失败"
            send_alert "Nginx 重启失败""Nginx 服务重启失败,请手动干预"
        fi
        return 1
    fi
    log"✓ Nginx 服务运行正常"
    return 0
}

check_nginx_port() {
    local port=${1:-80}

    if ! ss -tulpn | grep -q ":$port "; then
        log"⚠️  告警:端口 $port 未监听"
        send_alert "Nginx 端口告警""Nginx 端口 $port 未监听"
        return 1
    fi
    log"✓ 端口 $port 监听正常"
    return 0
}

check_nginx_process() {
    local pid=$(cat /run/nginx.pid 2>/dev/null)

    if [ -z "$pid" ]; then
        log"⚠️  告警:Nginx PID 文件不存在"
        return 1
    fi

    if ! kill -0 $pid 2>/dev/null; then
        log"⚠️  告警:Nginx 进程 $pid 不存在"
        return 1
    fi

    log"✓ Nginx 进程 $pid 运行正常"
    return 0
}

check_nginx_worker() {
    local worker_count=$(ps aux | grep "nginx: worker" | grep -v grep | wc -l)
    local expected_worker=$(nginx -V 2>&1 | grep -oP 'worker_processes\s+\K\d+' || echo"auto")

    if [ "$worker_count" -eq 0 ]; then
        log"⚠️  告警:Nginx Worker 进程数为 0"
        return 1
    fi

    log"✓ Nginx Worker 进程数:$worker_count"
    return 0
}

# 主检查
log"========== Nginx 健康检查开始 =========="

check_nginx_status
check_nginx_port 80
check_nginx_port 443
check_nginx_process
check_nginx_worker

log "========== Nginx 健康检查完成 =========="

三、Nginx 配置文件排查

3.1 配置文件测试

# ========== 配置测试命令 ==========
# 测试配置文件语法
nginx -t

# 输出示例(成功)
$ nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

# 输出示例(失败)
$ nginx -t
nginx: [emerg] unknown directive "invalid_directive" in /etc/nginx/conf.d/test.conf:10
nginx: configuration file /etc/nginx/nginx.conf test failed

# 详细输出配置
nginx -T

# 测试特定配置文件
nginx -t -c /etc/nginx/nginx.conf

# ========== 配置文件位置 ==========
# 主配置文件
/etc/nginx/nginx.conf

# 额外配置文件目录
/etc/nginx/conf.d/

# 站点配置文件
/etc/nginx/sites-available/
/etc/nginx/sites-enabled/

# 模块配置文件
/etc/nginx/modules-enabled/

3.2 常见配置错误及解决

错误 1:语法错误

# 故障现象
$ nginx -t
nginx: [emerg] unknown directive "server_nam" in /etc/nginx/conf.d/default.conf:5

# 原因:拼写错误
# 错误配置
server_nam example.com;

# 正确配置
server_name example.com;

# 解决方案
vi /etc/nginx/conf.d/default.conf
# 修正拼写错误
nginx -t
systemctl reload nginx

错误 2:缺少分号

# 故障现象
$ nginx -t
nginx: [emerg] unexpected "}" in /etc/nginx/conf.d/default.conf:15

# 原因:缺少分号
# 错误配置
server {
    listen 80
    server_name example.com;
}

# 正确配置
server {
    listen 80;
    server_name example.com;
}

# 解决方案
vi /etc/nginx/conf.d/default.conf
# 添加缺失的分号
nginx -t
systemctl reload nginx

错误 3:路径不存在

# 故障现象
$ nginx -t
nginx: [emerg] host not found in upstream "backend"
# 或
nginx: [emerg] open() "/etc/nginx/certs/server.crt" failed (2: No such file or directory)

# 原因:文件路径不存在

# 解决方案
# 1. 检查文件是否存在
ls -la /etc/nginx/certs/

# 2. 创建或修复路径
mkdir -p /etc/nginx/certs/
# 放置证书文件

# 3. 或注释相关配置
# ssl_certificate /etc/nginx/certs/server.crt;

# 4. 测试并重启
nginx -t
systemctl reload nginx

错误 4:端口冲突

# 故障现象
$ nginx -t
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)

# 原因:端口被占用

# 解决方案
# 1. 检查端口占用
ss -tulpn | grep :80
lsof -i :80
fuser 80/tcp

# 2. 停止占用服务
systemctl stop apache2
# 或
fuser -k 80/tcp

# 3. 或修改 Nginx 端口
vi /etc/nginx/conf.d/default.conf
# 修改 listen 80 为 listen 8080

# 4. 测试并重启
nginx -t
systemctl restart nginx

错误 5:权限问题

# 故障现象
$ nginx -t
nginx: [emerg] open() "/var/log/nginx/error.log" failed (13: Permission denied)

# 原因:文件权限不足

# 解决方案
# 1. 检查文件权限
ls -la /var/log/nginx/

# 2. 修复权限
chown -R nginx:nginx /var/log/nginx/
chmod 755 /var/log/nginx/

# 3. 检查 SELinux
getenforce
ls -Z /var/log/nginx/

# 4. 修复 SELinux 上下文
restorecon -Rv /var/log/nginx/

# 5. 测试并重启
nginx -t
systemctl restart nginx

3.3 配置文件检查脚本

#!/bin/bash
#===============================================================================
# 脚本名称:nginx_config_check.sh
# 功能描述:Nginx 配置文件检查脚本
#===============================================================================

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log_info() { echo -e "${GREEN}[INFO]${NC} $1"; }
log_warn() { echo -e "${YELLOW}[WARN]${NC} $1"; }
log_error() { echo -e "${RED}[ERROR]${NC} $1"; }

echo"=============================================="
echo"  Nginx 配置文件检查"
echo"=============================================="

# 1. 测试配置语法
log_info "【1. 测试配置语法】"
if nginx -t 2>&1; then
    log_info "✓ 配置语法检查通过"
else
    log_error "❌ 配置语法检查失败"
    nginx -t 2>&1
fi

# 2. 检查主配置文件
log_info "【2. 检查主配置文件】"
if [ -f /etc/nginx/nginx.conf ]; then
    log_info "✓ 主配置文件存在"
    ls -la /etc/nginx/nginx.conf
else
    log_error "❌ 主配置文件不存在"
fi

# 3. 检查额外配置
log_info "【3. 检查额外配置文件】"
if [ -d /etc/nginx/conf.d/ ]; then
    log_info "✓ conf.d 目录存在"
    ls -la /etc/nginx/conf.d/
else
    log_warn "⚠ conf.d 目录不存在"
fi

# 4. 检查站点配置
log_info "【4. 检查站点配置】"
if [ -d /etc/nginx/sites-enabled/ ]; then
    log_info "✓ sites-enabled 目录存在"
    ls -la /etc/nginx/sites-enabled/
else
    log_warn "⚠ sites-enabled 目录不存在"
fi

# 5. 检查 SSL 证书
log_info "【5. 检查 SSL 证书】"
ssl_cert=$(grep -r "ssl_certificate" /etc/nginx/ 2>/dev/null | head -1 | awk '{print $2}' | tr -d ';')
if [ -n "$ssl_cert" ] && [ -f "$ssl_cert" ]; then
    log_info "✓ SSL 证书存在:$ssl_cert"
    ls -la "$ssl_cert"
else
    log_warn "⚠ SSL 证书未找到或不存在"
fi

# 6. 检查日志目录
log_info "【6. 检查日志目录】"
if [ -d /var/log/nginx/ ]; then
    log_info "✓ 日志目录存在"
    ls -la /var/log/nginx/
else
    log_error "❌ 日志目录不存在"
    mkdir -p /var/log/nginx/
    chown nginx:nginx /var/log/nginx/
fi

# 7. 检查运行用户
log_info "【7. 检查运行用户】"
nginx_user=$(grep -E "^user" /etc/nginx/nginx.conf | awk '{print $2}' | tr -d ';')
if [ -z "$nginx_user" ]; then
    nginx_user="nginx"
fi
log_info "Nginx 运行用户:$nginx_user"
id $nginx_user 2>/dev/null || log_warn "⚠ 用户 $nginx_user 不存在"

# 8. 检查 worker 进程数
log_info "【8. 检查 worker 进程配置】"
worker_proc=$(grep -E "worker_processes" /etc/nginx/nginx.conf | awk '{print $2}' | tr -d ';')
log_info "worker_processes: $worker_proc"
cpu_cores=$(nproc)
log_info "CPU 核心数:$cpu_cores"

# 9. 输出完整配置
log_info "【9. 完整配置输出】"
echo"配置已输出到:/tmp/nginx_full_config.txt"
nginx -T > /tmp/nginx_full_config.txt 2>&1

echo""
echo"=============================================="
echo"  检查完成"
echo "=============================================="

四、Nginx 日志分析

4.1 Nginx 日志位置

# ========== 日志文件位置 ==========
# 错误日志
/var/log/nginx/error.log

# 访问日志
/var/log/nginx/access.log

# systemd 日志
journalctl -u nginx

# ========== 日志级别 ==========
# error.log 日志级别
debug    - 调试信息
info     - 信息
notice   - 通知
warn     - 警告
error    - 错误(默认)
crit     - 严重
alert    - 警报
emerg    - 紧急

# 配置日志级别
error_log /var/log/nginx/error.log warn;

4.2 错误日志分析

# ========== 查看错误日志 ==========
# 实时查看
tail -f /var/log/nginx/error.log

# 查看最近 100 行
tail -100 /var/log/nginx/error.log

# 查看错误级别日志
grep -i "error" /var/log/nginx/error.log

# 查看特定时间范围
awk '/2024\/01\/15 10:00/,/2024\/01\/15 12:00/' /var/log/nginx/error.log

# 统计错误类型
awk -F']' '{print $2}' /var/log/nginx/error.log | sort | uniq -c | sort -rn

# ========== 常见错误及含义 ==========
# 1. 权限错误
open() "/var/www/html/index.html" failed (13: Permission denied)
# 解决:chown -R nginx:nginx /var/www/html/

# 2. 文件不存在
open() "/var/www/html/missing.html" failed (2: No such file or directory)
# 解决:创建文件或修正配置

# 3. 连接上游失败
connect() failed (111: Connection refused) while connecting to upstream
# 解决:检查后端服务状态

# 4. 超时错误
upstream timed out (110: Connection timed out)
# 解决:增加超时时间或优化后端

# 5. 端口绑定失败
bind() to 0.0.0.0:80 failed (98: Address already in use)
# 解决:释放端口或修改配置

4.3 访问日志分析

# ========== 访问日志格式 ==========
# 默认格式
log_format main '$remote_addr - $remote_user [$time_local] "$request" '
                '$status $body_bytes_sent "$http_referer" '
                '"$http_user_agent" "$http_x_forwarded_for"';

# ========== 访问日志分析命令 ==========
# 查看最近请求
tail -100 /var/log/nginx/access.log

# 统计状态码
awk '{print $9}' /var/log/nginx/access.log | sort | uniq -c | sort -rn

# 统计 Top 10 IP
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# 统计 Top 10 请求 URL
awk '{print $7}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

# 统计 4xx 错误
awk '$9 ~ /^4/ {print $0}' /var/log/nginx/access.log

# 统计 5xx 错误
awk '$9 ~ /^5/ {print $0}' /var/log/nginx/access.log

# 统计慢请求(响应时间>1s)
awk '($NF > 1) {print $0}' /var/log/nginx/access.log

# 查找特定 IP 请求
grep "192.168.1.100" /var/log/nginx/access.log

# 查找特定 URL 请求
grep "/api/login" /var/log/nginx/access.log

# 统计每小时请求数
awk -F: '{print $2}' /var/log/nginx/access.log | cut -d' ' -f1 | sort | uniq -c

4.4 日志分析脚本

#!/bin/bash
#===============================================================================
# 脚本名称:nginx_log_analyzer.sh
# 功能描述:Nginx 日志分析脚本
#===============================================================================

ACCESS_LOG="/var/log/nginx/access.log"
ERROR_LOG="/var/log/nginx/error.log"
HOURS=${1:-24}

echo"=============================================="
echo"  Nginx 日志分析报告"
echo"  时间范围:最近${HOURS}小时"
echo"=============================================="

# 1. 访问日志统计
echo -e "\n【1. 访问日志统计】"
echo"总请求数:$(wc -l < $ACCESS_LOG)"
echo""

# 2. 状态码分布
echo"【2. 状态码分布】"
awk&nbsp;'{print $9}'$ACCESS_LOG&nbsp;| sort | uniq -c | sort -rn | head -10

# 3. Top 10 IP
echo&nbsp;-e&nbsp;"\n【3. Top 10 访问 IP】"
awk&nbsp;'{print $1}'$ACCESS_LOG&nbsp;| sort | uniq -c | sort -rn | head -10

# 4. Top 10 请求 URL
echo&nbsp;-e&nbsp;"\n【4. Top 10 请求 URL】"
awk&nbsp;'{print $7}'$ACCESS_LOG&nbsp;| sort | uniq -c | sort -rn | head -10

# 5. 4xx 错误统计
echo&nbsp;-e&nbsp;"\n【5. 4xx 客户端错误】"
awk&nbsp;'$9 ~ /^4/ {print $9, $7}'$ACCESS_LOG&nbsp;| sort | uniq -c | sort -rn | head -10

# 6. 5xx 服务器错误
echo&nbsp;-e&nbsp;"\n【6. 5xx 服务器错误】"
awk&nbsp;'$9 ~ /^5/ {print $9, $7, $NF}'$ACCESS_LOG&nbsp;| sort | uniq -c | sort -rn | head -10

# 7. 错误日志分析
echo&nbsp;-e&nbsp;"\n【7. 错误日志统计】"
echo"错误总数:$(wc -l < $ERROR_LOG)"
echo""
echo"错误类型分布:"
awk -F']''{print $2}'$ERROR_LOG&nbsp;| sort | uniq -c | sort -rn | head -10

# 8. 最近错误
echo&nbsp;-e&nbsp;"\n【8. 最近 20 条错误】"
tail -20&nbsp;$ERROR_LOG

echo""
echo"=============================================="
echo" &nbsp;分析完成"
echo&nbsp;"=============================================="

五、启动失败故障排查

5.1 启动失败排查流程

Nginx 启动失败
&nbsp; &nbsp; &nbsp; │
&nbsp; &nbsp; &nbsp; ▼
┌─────────────────┐
│&nbsp;1. systemctl status nginx │
│ &nbsp; &nbsp;查看服务状态 &nbsp; &nbsp;│
└────────┬────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼
┌─────────────────┐
│&nbsp;2. nginx -t &nbsp; &nbsp; │
│ &nbsp; &nbsp;测试配置文件 &nbsp; │
└────────┬────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼
┌─────────────────┐
│&nbsp;3. journalctl -u&nbsp;nginx │
│ &nbsp; &nbsp;查看详细日志 &nbsp; │
└────────┬────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼
┌─────────────────┐
│&nbsp;4. 检查端口占用 &nbsp;│
│ &nbsp; &nbsp;ss -tulpn &nbsp; &nbsp;│
└────────┬────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼
┌─────────────────┐
│&nbsp;5. 检查文件权限 &nbsp;│
│ &nbsp; &nbsp;ls&nbsp;-la&nbsp; &nbsp; &nbsp; &nbsp;│
└────────┬────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼
┌─────────────────┐
│&nbsp;6. 手动启动测试 &nbsp;│
│ &nbsp; &nbsp;nginx -g &nbsp; &nbsp; │
└─────────────────┘

5.2 启动失败案例排查

案例 1:配置文件错误导致启动失败

#&nbsp;故障现象
$&nbsp;systemctl status nginx
● nginx.service - The nginx HTTP and reverse proxy server
&nbsp; &nbsp;Loaded: loaded (/usr/lib/systemd/system/nginx.service; enabled)
&nbsp; &nbsp;Active: failed (Result: exit-code) since Thu 2024-01-15 10:00:00 CST
&nbsp; Process: 1234 ExecStartPre=/usr/sbin/nginx -t (code=exited, status=1/FAILURE)

#&nbsp;排查步骤
#&nbsp;1. 查看配置测试输出
nginx -t

#&nbsp;输出
nginx: [emerg] unknown directive "server_nam" in /etc/nginx/conf.d/default.conf:5

#&nbsp;2. 查看完整日志
journalctl -u nginx -n 50 --no-pager

#&nbsp;3. 修复配置
vi /etc/nginx/conf.d/default.conf
#&nbsp;修正拼写:server_nam -> server_name

#&nbsp;4. 测试配置
nginx -t

#&nbsp;5. 启动服务
systemctl start nginx

#&nbsp;6. 验证
systemctl status nginx

案例 2:端口占用导致启动失败

#&nbsp;故障现象
$&nbsp;systemctl status nginx
● nginx.service - The nginx HTTP and reverse proxy server
&nbsp; &nbsp;Active: failed (Result: exit-code)

#&nbsp;排查步骤
#&nbsp;1. 查看错误日志
journalctl -u nginx -n 30 --no-pager | grep -i "bind\|address"

#&nbsp;输出
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)

#&nbsp;2. 检查端口占用
ss -tulpn | grep :80
#&nbsp;或
lsof -i :80
#&nbsp;或
fuser 80/tcp

#&nbsp;输出
tcp &nbsp; LISTEN &nbsp;0 &nbsp;128 &nbsp;0.0.0.0:80 &nbsp;0.0.0.0:* &nbsp;users:(("apache2",pid=5678))

#&nbsp;3. 解决方案
#&nbsp;方案 A:停止占用服务
systemctl stop apache2
systemctl disable apache2

#&nbsp;方案 B:修改 Nginx 端口
vi /etc/nginx/conf.d/default.conf
#&nbsp;修改:listen 80 -> listen 8080

#&nbsp;4. 测试并启动
nginx -t
systemctl start nginx

#&nbsp;5. 验证
ss -tulpn | grep nginx

案例 3:权限问题导致启动失败

#&nbsp;故障现象
$&nbsp;systemctl status nginx
● nginx.service - The nginx HTTP and reverse proxy server
&nbsp; &nbsp;Active: failed (Result: exit-code)

#&nbsp;排查步骤
#&nbsp;1. 查看错误日志
journalctl -u nginx -n 30 --no-pager | grep -i "permission\|denied"

#&nbsp;输出
nginx: [emerg] open() "/var/log/nginx/error.log" failed (13: Permission denied)

#&nbsp;2. 检查文件权限
ls -la /var/log/nginx/
ls -la /var/run/nginx.pid

#&nbsp;3. 检查 Nginx 运行用户
grep "^user" /etc/nginx/nginx.conf
#&nbsp;输出:user nginx;

#&nbsp;4. 修复权限
chown -R nginx:nginx /var/log/nginx/
chown -R nginx:nginx /var/run/
chmod 755 /var/log/nginx/

#&nbsp;5. 检查 SELinux
getenforce
ls -Z /var/log/nginx/

#&nbsp;6. 修复 SELinux(如需要)
restorecon -Rv /var/log/nginx/
#&nbsp;或临时禁用
setenforce 0

#&nbsp;7. 测试并启动
nginx -t
systemctl start nginx

案例 4:SSL 证书问题导致启动失败

#&nbsp;故障现象
$&nbsp;systemctl status nginx
● nginx.service - The nginx HTTP and reverse proxy server
&nbsp; &nbsp;Active: failed (Result: exit-code)

#&nbsp;排查步骤
#&nbsp;1. 查看错误日志
journalctl -u nginx -n 30 --no-pager | grep -i "ssl\|certificate"

#&nbsp;输出
nginx: [emerg] cannot load certificate "/etc/nginx/certs/server.crt": BIO_new_file() failed (SSL: error:02001002:system library:fopen:No such file or directory)

#&nbsp;2. 检查证书文件
ls -la /etc/nginx/certs/

#&nbsp;3. 解决方案
#&nbsp;方案 A:放置证书文件
cp /path/to/server.crt /etc/nginx/certs/
cp /path/to/server.key /etc/nginx/certs/
chmod 600 /etc/nginx/certs/server.key

#&nbsp;方案 B:暂时禁用 SSL
vi /etc/nginx/conf.d/ssl.conf
#&nbsp;注释 SSL 相关配置

#&nbsp;4. 测试并启动
nginx -t
systemctl start nginx

5.3 启动失败排查脚本

#!/bin/bash
#===============================================================================
# 脚本名称:nginx_start_debug.sh
# 功能描述:Nginx 启动失败排查脚本
#===============================================================================

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log_info() {&nbsp;echo&nbsp;-e&nbsp;"${GREEN}[INFO]${NC}&nbsp;$1"; }
log_warn() {&nbsp;echo&nbsp;-e&nbsp;"${YELLOW}[WARN]${NC}&nbsp;$1"; }
log_error() {&nbsp;echo&nbsp;-e&nbsp;"${RED}[ERROR]${NC}&nbsp;$1"; }

echo"=============================================="
echo" &nbsp;Nginx 启动失败排查"
echo"=============================================="

# 1. 检查服务状态
log_info&nbsp;"【1. 服务状态】"
systemctl status nginx --no-pager -l

# 2. 测试配置文件
log_info&nbsp;"【2. 配置测试】"
if&nbsp;nginx -t 2>&1;&nbsp;then
&nbsp; &nbsp; log_info&nbsp;"✓ 配置测试通过"
else
&nbsp; &nbsp; log_error&nbsp;"❌ 配置测试失败"
&nbsp; &nbsp; nginx -t 2>&1
fi

# 3. 查看错误日志
log_info&nbsp;"【3. 错误日志】"
journalctl -u nginx -p err --no-pager -n 50

# 4. 检查端口占用
log_info&nbsp;"【4. 端口占用检查】"
for&nbsp;port&nbsp;in&nbsp;80 443;&nbsp;do
&nbsp; &nbsp;&nbsp;echo"端口&nbsp;$port:"
&nbsp; &nbsp; ss -tulpn | grep :$port&nbsp;||&nbsp;echo" &nbsp;未被占用"
done

# 5. 检查文件权限
log_info&nbsp;"【5. 文件权限检查】"
echo"日志目录:"
ls -la /var/log/nginx/ 2>/dev/null || log_warn&nbsp;"目录不存在"

echo"运行目录:"
ls -la /var/run/nginx* 2>/dev/null || log_warn&nbsp;"文件不存在"

echo"配置目录:"
ls -la /etc/nginx/ 2>/dev/null || log_warn&nbsp;"目录不存在"

# 6. 检查 SSL 证书
log_info&nbsp;"【6. SSL 证书检查】"
ssl_files=$(grep -rh&nbsp;"ssl_certificate"&nbsp;/etc/nginx/ 2>/dev/null | awk&nbsp;'{print $2}'&nbsp;| tr -d&nbsp;';')
for&nbsp;cert&nbsp;in$ssl_files;&nbsp;do
&nbsp; &nbsp;&nbsp;if&nbsp;[ -f&nbsp;"$cert"&nbsp;];&nbsp;then
&nbsp; &nbsp; &nbsp; &nbsp; log_info&nbsp;"✓ 证书存在:$cert"
&nbsp; &nbsp; &nbsp; &nbsp; ls -la&nbsp;"$cert"
&nbsp; &nbsp;&nbsp;else
&nbsp; &nbsp; &nbsp; &nbsp; log_warn&nbsp;"⚠ 证书不存在:$cert"
&nbsp; &nbsp;&nbsp;fi
done

# 7. 检查 SELinux
log_info&nbsp;"【7. SELinux 状态】"
getenforce 2>/dev/null ||&nbsp;echo"SELinux 未安装"

# 8. 手动启动测试
log_info&nbsp;"【8. 手动启动测试】"
nginx -g&nbsp;"daemon off;"&nbsp;&
NGINX_PID=$!
sleep 2
ifkill&nbsp;-0&nbsp;$NGINX_PID&nbsp;2>/dev/null;&nbsp;then
&nbsp; &nbsp; log_info&nbsp;"✓ 手动启动成功"
&nbsp; &nbsp;&nbsp;kill$NGINX_PID
else
&nbsp; &nbsp; log_error&nbsp;"❌ 手动启动失败"
fi

echo""
echo"=============================================="
echo" &nbsp;排查完成"
echo&nbsp;"=============================================="

六、运行异常故障排查

6.1 502 Bad Gateway 排查

#&nbsp;========== 故障现象 ==========
#&nbsp;浏览器显示 502 Bad Gateway

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 查看 Nginx 错误日志
tail -f /var/log/nginx/error.log | grep "502\|upstream"

#&nbsp;常见错误
connect() failed (111: Connection refused) while connecting to upstream
upstream prematurely closed connection

#&nbsp;2. 检查后端服务状态
systemctl status php-fpm
systemctl status tomcat
systemctl status node

#&nbsp;3. 检查后端端口
ss -tulpn | grep php-fpm
ss -tulpn | grep :8080

#&nbsp;4. 检查 upstream 配置
nginx -T | grep -A 10 "upstream"

#&nbsp;5. 测试后端连接
curl -v http://127.0.0.1:8080/
telnet 127.0.0.1 8080

#&nbsp;6. 检查超时配置
nginx -T | grep -E "proxy_connect_timeout|proxy_send_timeout|proxy_read_timeout"

#&nbsp;7. 增加超时时间(临时解决)
vi /etc/nginx/conf.d/upstream.conf
#&nbsp;添加:
#&nbsp;proxy_connect_timeout 60s;
#&nbsp;proxy_send_timeout 60s;
#&nbsp;proxy_read_timeout 60s;

#&nbsp;8. 重启 Nginx
nginx -t
systemctl reload nginx

6.2 503 Service Unavailable 排查

#&nbsp;========== 故障现象 ==========
#&nbsp;浏览器显示 503 Service Unavailable

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 查看错误日志
tail -f /var/log/nginx/error.log | grep "503\|upstream"

#&nbsp;常见错误
no live upstreams while connecting to upstream
upstream server temporarily disabled

#&nbsp;2. 检查 upstream 服务器
nginx -T | grep -A 20 "upstream"

#&nbsp;3. 检查后端健康状态
for server in 192.168.1.10 192.168.1.11; do
&nbsp; &nbsp; curl -o /dev/null -s -w "%{http_code}\n" http://$server:8080/health
done

#&nbsp;4. 检查 max_fails 配置
#&nbsp;如果后端频繁失败,可能被标记为不可用

#&nbsp;5. 临时解决方案
#&nbsp;注释掉失败的 upstream 服务器
#&nbsp;或直接指向可用的后端

#&nbsp;6. 检查连接限制
nginx -T | grep -E "max_conns|max_failures"

6.3 504 Gateway Timeout 排查

#&nbsp;========== 故障现象 ==========
#&nbsp;浏览器显示 504 Gateway Timeout

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 查看错误日志
tail -f /var/log/nginx/error.log | grep "504\|timed out"

#&nbsp;常见错误
upstream timed out (110: Connection timed out)

#&nbsp;2. 检查超时配置
nginx -T | grep -E "proxy_connect_timeout|proxy_send_timeout|proxy_read_timeout"

#&nbsp;3. 增加超时时间
vi /etc/nginx/conf.d/proxy.conf
#&nbsp;添加:
#&nbsp;proxy_connect_timeout 60s;
#&nbsp;proxy_send_timeout 60s;
#&nbsp;proxy_read_timeout 60s;

#&nbsp;4. 检查后端性能
#&nbsp;后端处理慢,需要优化

#&nbsp;5. 检查网络延迟
ping 后端服务器 IP
traceroute 后端服务器 IP

#&nbsp;6. 重启 Nginx
nginx -t
systemctl reload nginx

6.4 403 Forbidden 排查

#&nbsp;========== 故障现象 ==========
#&nbsp;浏览器显示 403 Forbidden

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 查看错误日志
tail -f /var/log/nginx/error.log | grep "403\|permission"

#&nbsp;常见错误
directory index of "/var/www/html/" is forbidden
open() "/var/www/html/index.html" failed (13: Permission denied)

#&nbsp;2. 检查文件权限
ls -la /var/www/html/
ls -la /var/www/html/index.html

#&nbsp;3. 修复权限
chown -R nginx:nginx /var/www/html/
chmod -R 755 /var/www/html/
chmod 644 /var/www/html/index.html

#&nbsp;4. 检查 SELinux
getenforce
ls -Z /var/www/html/

#&nbsp;5. 修复 SELinux
restorecon -Rv /var/www/html/

#&nbsp;6. 检查 autoindex 配置
nginx -T | grep autoindex
#&nbsp;如果需要目录列表,添加:
#&nbsp;autoindex on;

#&nbsp;7. 检查 index 文件
nginx -T | grep index
#&nbsp;确保 index 文件存在
ls -la /var/www/html/index.html

6.5 404 Not Found 排查

#&nbsp;========== 故障现象 ==========
#&nbsp;浏览器显示 404 Not Found

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 查看错误日志
tail -f /var/log/nginx/error.log | grep "404\|No such file"

#&nbsp;常见错误
open() "/var/www/html/missing.html" failed (2: No such file or directory)

#&nbsp;2. 检查文件是否存在
ls -la /var/www/html/missing.html

#&nbsp;3. 检查 root 配置
nginx -T | grep -E "root|location"

#&nbsp;4. 检查 try_files 配置
nginx -T | grep try_files

#&nbsp;5. 检查 rewrite 规则
nginx -T | grep rewrite

#&nbsp;6. 检查符号链接
ls -la /var/www/html/
#&nbsp;确保符号链接目标存在

七、性能问题排查

7.1 CPU 占用高排查

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 查看 CPU 使用
top -bn1 | grep nginx
ps aux --sort=-%cpu | grep nginx | head -5

#&nbsp;2. 查看进程详情
pidstat -u -p $(cat /run/nginx.pid) 1 5

#&nbsp;3. 查看 Worker 进程
ps aux | grep "nginx: worker"

#&nbsp;4. 检查 worker_processes 配置
nginx -T | grep worker_processes
#&nbsp;建议设置为 CPU 核心数
#&nbsp;worker_processes auto;

#&nbsp;5. 检查连接数
ss -tn | grep :80 | wc -l

#&nbsp;6. 检查请求处理
nginx -T | grep -E "worker_connections|keepalive"

#&nbsp;7. 优化配置
vi /etc/nginx/nginx.conf
#&nbsp;添加/修改:
#&nbsp;worker_processes auto;
#&nbsp;worker_rlimit_nofile 65535;
#&nbsp;events {
#&nbsp; &nbsp; &nbsp;worker_connections 65535;
#&nbsp; &nbsp; &nbsp;use epoll;
#&nbsp; &nbsp; &nbsp;multi_accept on;
#&nbsp;}

#&nbsp;8. 启用缓存
#&nbsp;减少后端请求

7.2 内存占用高排查

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 查看内存使用
free -h
ps aux --sort=-%mem | grep nginx

#&nbsp;2. 查看进程内存
cat /proc/$(cat /run/nginx.pid)/status | grep -E "VmSize|VmRSS|VmSwap"

#&nbsp;3. 检查 buffer 配置
nginx -T | grep -E "buffer_size|buffers"

#&nbsp;4. 检查缓存配置
nginx -T | grep -E "proxy_cache|fastcgi_cache"

#&nbsp;5. 优化配置
vi /etc/nginx/nginx.conf
#&nbsp;调整 buffer 大小:
#&nbsp;proxy_buffer_size 4k;
#&nbsp;proxy_buffers 4 32k;
#&nbsp;proxy_busy_buffers_size 64k;

#&nbsp;6. 限制连接数
#&nbsp;events {
#&nbsp; &nbsp; &nbsp;worker_connections 1024;
#&nbsp;}

#&nbsp;7. 重启释放内存
systemctl restart nginx

7.3 连接数过多排查

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 查看连接统计
ss -tn | grep :80 | wc -l
ss -tn state established | grep :80 | wc -l
ss -tn state time-wait | grep :80 | wc -l

#&nbsp;2. 查看 Nginx 状态
#&nbsp;启用 stub_status 模块
location /nginx_status {
&nbsp; &nbsp; stub_status on;
&nbsp; &nbsp; access_log off;
&nbsp; &nbsp; allow 127.0.0.1;
&nbsp; &nbsp; deny all;
}

#&nbsp;访问查看
curl http://localhost/nginx_status

#&nbsp;3. 检查连接限制
nginx -T | grep -E "limit_conn|limit_req"

#&nbsp;4. 优化配置
vi /etc/nginx/nginx.conf
#&nbsp;添加:
#&nbsp;events {
#&nbsp; &nbsp; &nbsp;worker_connections 65535;
#&nbsp; &nbsp; &nbsp;multi_accept on;
#&nbsp;}
#&nbsp;http {
#&nbsp; &nbsp; &nbsp;keepalive_timeout 65;
#&nbsp; &nbsp; &nbsp;keepalive_requests 100;
#&nbsp;}

#&nbsp;5. 限制单 IP 连接
limit_conn_zone $binary_remote_addr zone=addr:10m;
limit_conn addr 100;

#&nbsp;6. 检查 DDoS 攻击
#&nbsp;查看异常 IP
awk '{print $1}' /var/log/nginx/access.log | sort | uniq -c | sort -rn | head -10

7.4 响应慢排查

# ========== 排查步骤 ==========
# 1. 启用响应时间日志
vi&nbsp;/etc/nginx/nginx.conf
# 修改 log_format:
log_format main&nbsp;'$remote_addr&nbsp;-&nbsp;$remote_user&nbsp;[$time_local] "$request" '
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;'$status&nbsp;$body_bytes_sent&nbsp;"$http_referer" '
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;'"$http_user_agent" "$http_x_forwarded_for" '
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;'$request_time&nbsp;$upstream_response_time';

# 2. 分析慢请求
awk'($NF&nbsp;> 1) {print&nbsp;$0}'&nbsp;/var/log/nginx/access.log | head -20

# 3. 统计平均响应时间
awk&nbsp;'{sum+=$NF; count++} END {print "平均响应时间:", sum/count}'&nbsp;/var/log/nginx/access.log

# 4. 检查后端响应时间
awk&nbsp;'{print $(NF-1)}'&nbsp;/var/log/nginx/access.log | sort -n | tail -20

# 5. 优化配置
# 启用缓存
proxy_cache_path /var/cache/nginx levels=1:2&nbsp;keys_zone=my_cache:10m&nbsp;max_size=1g&nbsp;inactive=60m;
proxy_cache&nbsp;my_cache;
proxy_cache_valid20030210m;

# 启用 gzip 压缩
gzipon;
gzip_types&nbsp;text/plain application/json application/javascript text/css;

# 6. 检查网络延迟
ping&nbsp;后端服务器
traceroute 后端服务器

八、网络连接问题排查

8.1 端口不通排查

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 检查 Nginx 是否监听端口
ss -tulpn | grep nginx
netstat -tulpn | grep nginx

#&nbsp;2. 检查防火墙
firewall-cmd --list-all
iptables -L -n | grep :80

#&nbsp;3. 开放端口
firewall-cmd --permanent --add-service=http
firewall-cmd --permanent --add-service=https
firewall-cmd --reload

#&nbsp;4. 检查 SELinux
getenforce
sestatus | grep http

#&nbsp;5. 允许 HTTP 连接
setsebool -P httpd_can_network_connect 1

#&nbsp;6. 本地测试
curl -v http://127.0.0.1:80/

#&nbsp;7. 远程测试
telnet 服务器 IP 80
nc -zv 服务器 IP 80

8.2 连接被拒绝排查

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 检查 Nginx 状态
systemctl status nginx

#&nbsp;2. 检查端口监听
ss -tulpn | grep :80

#&nbsp;3. 检查 listen 配置
nginx -T | grep listen

#&nbsp;4. 检查&nbsp;bind&nbsp;配置
#&nbsp;确保监听正确地址
#&nbsp;listen 80; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;# 所有接口
#&nbsp;listen 192.168.1.100:80; &nbsp;# 特定接口

#&nbsp;5. 检查防火墙
firewall-cmd --list-all

#&nbsp;6. 检查 TCP &nbsp;backlog
sysctl net.core.somaxconn
#&nbsp;建议值:65535

#&nbsp;7. 检查连接队列
ss -s

8.3 连接超时排查

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 检查超时配置
nginx -T | grep -E "timeout"

#&nbsp;2. 检查客户端超时
#&nbsp;client_body_timeout
#&nbsp;client_header_timeout
#&nbsp;send_timeout

#&nbsp;3. 检查代理超时
#&nbsp;proxy_connect_timeout
#&nbsp;proxy_send_timeout
#&nbsp;proxy_read_timeout

#&nbsp;4. 优化配置
vi /etc/nginx/conf.d/timeout.conf
#&nbsp;添加:
#&nbsp;client_body_timeout 60s;
#&nbsp;client_header_timeout 60s;
#&nbsp;send_timeout 60s;
#&nbsp;proxy_connect_timeout 60s;
#&nbsp;proxy_send_timeout 60s;
#&nbsp;proxy_read_timeout 60s;

#&nbsp;5. 检查网络延迟
ping 客户端/服务器
mtr 客户端/服务器

#&nbsp;6. 检查 TCP 参数
sysctl net.ipv4.tcp_keepalive_time
sysctl net.ipv4.tcp_fin_timeout

九、SSL/TLS 问题排查

9.1 SSL 证书问题排查

#&nbsp;========== 排查步骤 ==========
#&nbsp;1. 检查证书文件
ls -la /etc/nginx/certs/

#&nbsp;2. 验证证书
openssl x509 -in /etc/nginx/certs/server.crt -text -noout

#&nbsp;3. 检查证书有效期
openssl x509 -in /etc/nginx/certs/server.crt -noout -dates

#&nbsp;4. 检查证书链
openssl verify -CAfile /etc/nginx/certs/ca-bundle.crt /etc/nginx/certs/server.crt

#&nbsp;5. 检查私钥
openssl rsa -in /etc/nginx/certs/server.key -check

#&nbsp;6. 验证证书和私钥匹配
openssl x509 -noout -modulus -in /etc/nginx/certs/server.crt | openssl md5
openssl rsa -noout -modulus -in /etc/nginx/certs/server.key | openssl md5
#&nbsp;两个 MD5 应该相同

#&nbsp;7. 测试 SSL 连接
openssl s_client -connect localhost:443 -servername example.com

#&nbsp;8. 在线测试
#&nbsp;https://www.ssllabs.com/ssltest/

9.2 HTTPS 无法访问排查

# ========== 排查步骤 ==========
# 1. 检查 443 端口监听
ss&nbsp;-tulpn | grep :443

# 2. 检查 SSL 配置
nginx -T | grep -A&nbsp;10"listen 443"

# 3. 检查证书路径
nginx -T | grep ssl_certificate

# 4. 检查 SSL 协议
nginx -T | grep -E&nbsp;"ssl_protocols|ssl_ciphers"

# 5. 推荐配置
ssl_protocols TLSv1.2&nbsp;TLSv1.3;
ssl_ciphers&nbsp;HIGH:!aNULL:!MD5;
ssl_prefer_server_cipherson;
ssl_session_cache&nbsp;shared:SSL:10m;
ssl_session_timeout10m;

# 6. 测试 HTTPS
curl&nbsp;-kv https://localhost/

# 7. 检查防火墙
firewall-cmd --list-services | grep https

十、常见故障案例库

10.1 故障案例速查表

| | | | | | — | — | — | — | | 故障现象 | 可能原因 | 排查命令 | 解决方案 | | 502 Bad Gateway | 后端服务未启动 | systemctl status php-fpm | 启动后端服务 | | 503 Service Unavailable | 所有 upstream 不可用 | nginx -T | grep upstream | 修复后端服务器 | | 504 Gateway Timeout | 后端响应超时 | tail -f error.log | 增加超时时间 | | 403 Forbidden | 文件权限错误 | ls -la /var/www/ | 修复文件权限 | | 404 Not Found | 文件不存在 | ls -la 文件路径 | 创建文件或修正路径 | | 启动失败 | 配置语法错误 | nginx -t | 修正配置 | | 启动失败 | 端口被占用 | ss -tulpn | grep :80 | 释放端口 | | CPU 占用高 | worker 过多 | ps aux | grep nginx | 调整 worker_processes | | 内存占用高 | buffer 过大 | nginx -T | grep buffer | 调整 buffer 大小 | | 连接数多 | 未限制连接 | ss -tn | wc -l | 配置 limit_conn | | SSL 错误 | 证书过期 | openssl x509 -dates | 更新证书 | | HTTPS 失败 | 443 端口未开放 | ss -tulpn | grep :443 | 开放端口 |

10.2 综合故障案例

案例:电商网站访问缓慢

#&nbsp;故障现象
#&nbsp;用户反馈网站访问缓慢,部分请求超时

#&nbsp;排查过程
#&nbsp;1. 检查 Nginx 状态
systemctl status nginx &nbsp;# 正常运行

#&nbsp;2. 检查系统资源
free -h &nbsp;# 内存充足
df -h &nbsp; &nbsp;# 磁盘充足
uptime &nbsp; # 负载正常

#&nbsp;3. 检查连接数
ss -tn | grep :80 | wc -l &nbsp;# 连接数 5000+

#&nbsp;4. 检查日志
tail -f /var/log/nginx/access.log | awk '{print $NF}' | sort -n | tail -10
#&nbsp;发现大量请求响应时间>5s

#&nbsp;5. 检查后端
systemctl status php-fpm &nbsp;# 正常运行
ss -tulpn | grep :9000 &nbsp; &nbsp;# 端口监听正常

#&nbsp;6. 检查 upstream 响应时间
awk '{print $(NF-1)}' /var/log/nginx/access.log | sort -n | tail -20
#&nbsp;发现后端响应时间普遍>3s

#&nbsp;7. 检查数据库
mysql -e "SHOW PROCESSLIST;" | wc -l &nbsp;# 连接数 500+

#&nbsp;根本原因
#&nbsp;数据库连接过多,导致后端处理慢

#&nbsp;解决方案
#&nbsp;1. 优化数据库连接池
#&nbsp;2. 启用 Nginx 缓存
#&nbsp;3. 增加后端服务器
#&nbsp;4. 限制请求速率

#&nbsp;配置优化
vi /etc/nginx/conf.d/cache.conf
#&nbsp;添加缓存配置
proxy_cache_path /var/cache/nginx levels=1:2 keys_zone=my_cache:10m max_size=1g;
proxy_cache my_cache;
proxy_cache_valid 200 302 10m;

#&nbsp;重启验证
nginx -t
systemctl reload nginx

十一、自动化排查脚本

11.1 Nginx 综合诊断脚本

#!/bin/bash
#===============================================================================
# 脚本名称:nginx_diagnosis.sh
# 功能描述:Nginx 综合故障诊断脚本
#===============================================================================

set&nbsp;-e

OUTPUT_DIR="/tmp/nginx_diagnosis_$(date +%Y%m%d_%H%M%S)"
mkdir -p&nbsp;$OUTPUT_DIR

RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
NC='\033[0m'

log_info() {&nbsp;echo&nbsp;-e&nbsp;"${GREEN}[INFO]${NC}&nbsp;$1"&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log; }
log_warn() {&nbsp;echo&nbsp;-e&nbsp;"${YELLOW}[WARN]${NC}&nbsp;$1"&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log; }
log_error() {&nbsp;echo&nbsp;-e&nbsp;"${RED}[ERROR]${NC}&nbsp;$1"&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log; }

echo"=============================================="&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log
echo" &nbsp;Nginx 综合诊断报告"&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log
echo" &nbsp;时间:$(date)"&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log
echo" &nbsp;主机:$(hostname)"&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log
echo"=============================================="&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log

# 1. 服务状态
log_info&nbsp;"【1. 服务状态】"
systemctl status nginx --no-pager >&nbsp;$OUTPUT_DIR/service_status.txt 2>&1
cat&nbsp;$OUTPUT_DIR/service_status.txt

# 2. 配置测试
log_info&nbsp;"【2. 配置测试】"
if&nbsp;nginx -t >&nbsp;$OUTPUT_DIR/config_test.txt 2>&1;&nbsp;then
&nbsp; &nbsp; log_info&nbsp;"✓ 配置测试通过"
else
&nbsp; &nbsp; log_error&nbsp;"❌ 配置测试失败"
&nbsp; &nbsp; cat&nbsp;$OUTPUT_DIR/config_test.txt
fi

# 3. 进程信息
log_info&nbsp;"【3. 进程信息】"
ps aux | grep nginx >&nbsp;$OUTPUT_DIR/process_info.txt 2>&1
cat&nbsp;$OUTPUT_DIR/process_info.txt

# 4. 端口监听
log_info&nbsp;"【4. 端口监听】"
ss -tulpn | grep nginx >&nbsp;$OUTPUT_DIR/port_listen.txt 2>&1
cat&nbsp;$OUTPUT_DIR/port_listen.txt

# 5. 连接统计
log_info&nbsp;"【5. 连接统计】"
{
&nbsp; &nbsp;&nbsp;echo"总连接数:$(ss -tn | wc -l)"
&nbsp; &nbsp;&nbsp;echo"80 端口连接:$(ss -tn | grep :80 | wc -l)"
&nbsp; &nbsp;&nbsp;echo"443 端口连接:$(ss -tn | grep :443 | wc -l)"
&nbsp; &nbsp;&nbsp;echo"TIME_WAIT:&nbsp;$(ss -tn state time-wait | wc -l)"
} | tee -a&nbsp;$OUTPUT_DIR/connection_stats.txt

# 6. 日志分析
log_info&nbsp;"【6. 日志分析】"
{
&nbsp; &nbsp;&nbsp;echo"=== 错误日志最后 20 条 ==="
&nbsp; &nbsp; tail -20 /var/log/nginx/error.log 2>/dev/null ||&nbsp;echo"无错误日志"
&nbsp; &nbsp;&nbsp;echo""
&nbsp; &nbsp;&nbsp;echo"=== 状态码分布 ==="
&nbsp; &nbsp; awk&nbsp;'{print $9}'&nbsp;/var/log/nginx/access.log 2>/dev/null | sort | uniq -c | sort -rn | head -10
} | tee -a&nbsp;$OUTPUT_DIR/log_analysis.txt

# 7. 资源使用
log_info&nbsp;"【7. 资源使用】"
{
&nbsp; &nbsp;&nbsp;echo"=== 内存 ==="
&nbsp; &nbsp; free -h
&nbsp; &nbsp;&nbsp;echo""
&nbsp; &nbsp;&nbsp;echo"=== 磁盘 ==="
&nbsp; &nbsp; df -h /var/log&nbsp;/var/cache 2>/dev/null
&nbsp; &nbsp;&nbsp;echo""
&nbsp; &nbsp;&nbsp;echo"=== 负载 ==="
&nbsp; &nbsp; uptime
} | tee -a&nbsp;$OUTPUT_DIR/resource_usage.txt

# 8. 配置检查
log_info&nbsp;"【8. 配置检查】"
{
&nbsp; &nbsp;&nbsp;echo"worker_processes:&nbsp;$(grep worker_processes /etc/nginx/nginx.conf 2>/dev/null)"
&nbsp; &nbsp;&nbsp;echo"worker_connections:&nbsp;$(grep worker_connections /etc/nginx/nginx.conf 2>/dev/null)"
&nbsp; &nbsp;&nbsp;echo"SSL 证书:$(grep -r ssl_certificate /etc/nginx/ 2>/dev/null | head -3)"
} | tee -a&nbsp;$OUTPUT_DIR/config_check.txt

# 9. 检查清单
log_info&nbsp;"【9. 检查清单】"
{
&nbsp; &nbsp;&nbsp;echo"□ 服务状态:$(systemctl is-active nginx 2>/dev/null)"
&nbsp; &nbsp;&nbsp;echo"□ 配置测试:$(nginx -t 2>&1 | grep -o 'syntax is ok' || echo 'failed')"
&nbsp; &nbsp;&nbsp;echo"□ 80 端口:$(ss -tulpn | grep -q ':80 ' && echo '监听中' || echo '未监听')"
&nbsp; &nbsp;&nbsp;echo"□ 443 端口:$(ss -tulpn | grep -q ':443 ' && echo '监听中' || echo '未监听')"
&nbsp; &nbsp;&nbsp;echo"□ 磁盘使用:$(df /var/log | tail -1 | awk '{print $5}')"
&nbsp; &nbsp;&nbsp;echo"□ 错误日志:$(tail -1 /var/log/nginx/error.log 2>/dev/null | cut -c1-80)"
} | tee -a&nbsp;$OUTPUT_DIR/checklist.txt

# 10. 建议
log_info&nbsp;"【10. 排查建议】"
{
&nbsp; &nbsp;&nbsp;echo"1. 服务问题:systemctl status nginx"
&nbsp; &nbsp;&nbsp;echo"2. 配置问题:nginx -t"
&nbsp; &nbsp;&nbsp;echo"3. 日志分析:tail -f /var/log/nginx/error.log"
&nbsp; &nbsp;&nbsp;echo"4. 连接问题:ss -tulpn | grep nginx"
&nbsp; &nbsp;&nbsp;echo"5. 性能问题:top -p \$(cat /run/nginx.pid)"
} | tee -a&nbsp;$OUTPUT_DIR/suggestions.txt

echo""&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log
echo"=============================================="&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log
echo" &nbsp;诊断完成"&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log
echo" &nbsp;报告目录:$OUTPUT_DIR"&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log
echo"=============================================="&nbsp;| tee -a&nbsp;$OUTPUT_DIR/diagnosis.log

11.2 Nginx 健康检查脚本

#!/bin/bash
#===============================================================================
# 脚本名称:nginx_health_check.sh
# 功能描述:Nginx 健康检查脚本(可用于监控)
#===============================================================================

LOG_FILE="/var/log/nginx_health_check.log"
ALERT_EMAIL="[email protected]"
EXIT_CODE=0

log() {
&nbsp; &nbsp;&nbsp;echo"[$(date '+%Y-%m-%d %H:%M:%S')]&nbsp;$1"&nbsp;| tee -a&nbsp;$LOG_FILE
}

send_alert() {
&nbsp; &nbsp;&nbsp;local&nbsp;subject=$1
&nbsp; &nbsp;&nbsp;local&nbsp;message=$2
&nbsp; &nbsp;&nbsp;echo"$message"&nbsp;| mail -s&nbsp;"$subject"$ALERT_EMAIL&nbsp;2>/dev/null
&nbsp; &nbsp;&nbsp;log"⚠️ &nbsp;告警已发送:$subject"
}

check_service() {
&nbsp; &nbsp;&nbsp;local&nbsp;status=$(systemctl is-active nginx 2>/dev/null)
&nbsp; &nbsp;&nbsp;if&nbsp;[&nbsp;"$status"&nbsp;!=&nbsp;"active"&nbsp;];&nbsp;then
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"❌ Nginx 服务状态异常:$status"
&nbsp; &nbsp; &nbsp; &nbsp; send_alert&nbsp;"Nginx 服务告警""Nginx 服务状态:$status"
&nbsp; &nbsp; &nbsp; &nbsp; EXIT_CODE=1
&nbsp; &nbsp;&nbsp;else
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"✓ Nginx 服务运行正常"
&nbsp; &nbsp;&nbsp;fi
}

check_port() {
&nbsp; &nbsp;&nbsp;local&nbsp;port=$1
&nbsp; &nbsp;&nbsp;if&nbsp;! ss -tulpn | grep -q&nbsp;":$port&nbsp;";&nbsp;then
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"❌ 端口&nbsp;$port&nbsp;未监听"
&nbsp; &nbsp; &nbsp; &nbsp; send_alert&nbsp;"Nginx 端口告警""端口&nbsp;$port&nbsp;未监听"
&nbsp; &nbsp; &nbsp; &nbsp; EXIT_CODE=1
&nbsp; &nbsp;&nbsp;else
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"✓ 端口&nbsp;$port&nbsp;监听正常"
&nbsp; &nbsp;&nbsp;fi
}

check_config() {
&nbsp; &nbsp;&nbsp;if&nbsp;! nginx -t > /dev/null 2>&1;&nbsp;then
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"❌ Nginx 配置测试失败"
&nbsp; &nbsp; &nbsp; &nbsp; send_alert&nbsp;"Nginx 配置告警""配置测试失败"
&nbsp; &nbsp; &nbsp; &nbsp; EXIT_CODE=1
&nbsp; &nbsp;&nbsp;else
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"✓ Nginx 配置测试通过"
&nbsp; &nbsp;&nbsp;fi
}

check_error_log() {
&nbsp; &nbsp;&nbsp;local&nbsp;error_count=$(tail -100 /var/log/nginx/error.log 2>/dev/null | grep -c&nbsp;"error")
&nbsp; &nbsp;&nbsp;if&nbsp;[&nbsp;"$error_count"&nbsp;-gt 10 ];&nbsp;then
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"⚠️ &nbsp;错误日志异常:最近 100 条有&nbsp;$error_count&nbsp;条错误"
&nbsp; &nbsp; &nbsp; &nbsp; send_alert&nbsp;"Nginx 错误日志告警""最近 100 条日志有&nbsp;$error_count&nbsp;条错误"
&nbsp; &nbsp; &nbsp; &nbsp; EXIT_CODE=1
&nbsp; &nbsp;&nbsp;else
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"✓ 错误日志正常:$error_count&nbsp;条错误"
&nbsp; &nbsp;&nbsp;fi
}

check_5xx() {
&nbsp; &nbsp;&nbsp;local&nbsp;count_5xx=$(tail -1000 /var/log/nginx/access.log 2>/dev/null | awk&nbsp;'$9 ~ /^5/ {count++} END {print count+0}')
&nbsp; &nbsp;&nbsp;if&nbsp;[&nbsp;"$count_5xx"&nbsp;-gt 50 ];&nbsp;then
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"⚠️ &nbsp;5xx 错误过多:最近 1000 条请求有&nbsp;$count_5xx&nbsp;个 5xx 错误"
&nbsp; &nbsp; &nbsp; &nbsp; send_alert&nbsp;"Nginx 5xx 错误告警""5xx 错误数:$count_5xx"
&nbsp; &nbsp; &nbsp; &nbsp; EXIT_CODE=1
&nbsp; &nbsp;&nbsp;else
&nbsp; &nbsp; &nbsp; &nbsp;&nbsp;log"✓ 5xx 错误正常:$count_5xx&nbsp;个"
&nbsp; &nbsp;&nbsp;fi
}

# 主检查
log"========== Nginx 健康检查开始 =========="

check_service
check_port 80
check_port 443
check_config
check_error_log
check_5xx

log"========== Nginx 健康检查完成 (退出码:$EXIT_CODE) =========="

exit&nbsp;$EXIT_CODE

十二、故障排查流程图

12.1 Nginx 故障排查总流程

┌─────────────────────────────────────────────────────────────────┐
│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Nginx 故障报告 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
└─────────────────────────┬───────────────────────────────────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ▼
┌─────────────────────────────────────────────────────────────────┐
│ &nbsp;1. 确认故障现象 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
│ &nbsp; &nbsp; - 服务无法启动 / 运行异常 / 性能问题 / 网络连接 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
└─────────────────────────┬───────────────────────────────────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ┌───────────────┼───────────────┐
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ▼ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ▼ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ▼
&nbsp; &nbsp; ┌──────────┐ &nbsp; ┌──────────┐ &nbsp; ┌──────────┐
&nbsp; &nbsp; │ 启动失败 &nbsp;│ &nbsp; │ 运行异常 &nbsp;│ &nbsp; │ 性能问题 &nbsp;│
&nbsp; &nbsp; └────┬─────┘ &nbsp; └────┬─────┘ &nbsp; └────┬─────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼
&nbsp; &nbsp; nginx -t &nbsp; &nbsp; &nbsp; journalctl &nbsp; &nbsp; top/htop
&nbsp; &nbsp; 配置测试 &nbsp; &nbsp; &nbsp; &nbsp;日志分析 &nbsp; &nbsp; &nbsp; &nbsp;资源监控
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;▼
&nbsp; &nbsp; 修复配置 &nbsp; &nbsp; &nbsp; &nbsp;分析错误 &nbsp; &nbsp; &nbsp; &nbsp;优化配置
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│ &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;└──────────────┼──────────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ▼
┌─────────────────────────────────────────────────────────────────┐
│ &nbsp;2. 验证修复 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
│ &nbsp; &nbsp; - nginx -t && systemctl reload nginx &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
│ &nbsp; &nbsp; - 测试访问 / 监控指标 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
└─────────────────────────┬───────────────────────────────────────┘
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ▼
┌─────────────────────────────────────────────────────────────────┐
│ &nbsp;3. 记录总结 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; │
│ &nbsp; &nbsp; - 故障原因 / 解决过程 / 预防措施 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;│
└─────────────────────────────────────────────────────────────────┘

十三、最佳实践

13.1 故障预防建议

| | | | — | — | | 实践 | 说明 | | 配置监控告警 | 服务状态、端口、错误率监控 | | 定期配置检查 | nginx -t 纳入变更流程 | | 日志轮转配置 | 防止日志占满磁盘 | | 配置备份 | 变更前备份配置文件 | | 灰度发布 | 配置变更先测试环境验证 | | 性能基线 | 建立性能指标基线 | | 应急预案 | 制定故障应急处理流程 | | 定期演练 | 定期进行故障演练 |

13.2 配置优化建议

# /etc/nginx/nginx.conf 优化建议

# 1. 进程配置
user&nbsp;nginx;
worker_processes&nbsp;auto;
worker_rlimit_nofile65535;

# 2. 事件配置
events&nbsp;{
&nbsp; &nbsp;&nbsp;worker_connections65535;
&nbsp; &nbsp;&nbsp;useepoll;
&nbsp; &nbsp;&nbsp;multi_accepton;
}

# 3. HTTP 配置
http&nbsp;{
&nbsp; &nbsp;&nbsp;# 基础优化
&nbsp; &nbsp;&nbsp;sendfileon;
&nbsp; &nbsp;&nbsp;tcp_nopushon;
&nbsp; &nbsp;&nbsp;tcp_nodelayon;
&nbsp; &nbsp;&nbsp;keepalive_timeout65;
&nbsp; &nbsp;&nbsp;keepalive_requests100;

&nbsp; &nbsp;&nbsp;# Buffer 优化
&nbsp; &nbsp;&nbsp;client_body_buffer_size10M;
&nbsp; &nbsp;&nbsp;client_max_body_size100M;

&nbsp; &nbsp;&nbsp;# 超时配置
&nbsp; &nbsp;&nbsp;client_body_timeout60s;
&nbsp; &nbsp;&nbsp;client_header_timeout60s;
&nbsp; &nbsp;&nbsp;send_timeout60s;

&nbsp; &nbsp;&nbsp;# 代理配置
&nbsp; &nbsp;&nbsp;proxy_connect_timeout60s;
&nbsp; &nbsp;&nbsp;proxy_send_timeout60s;
&nbsp; &nbsp;&nbsp;proxy_read_timeout60s;
&nbsp; &nbsp;&nbsp;proxy_buffer_size4k;
&nbsp; &nbsp;&nbsp;proxy_buffers432k;

&nbsp; &nbsp;&nbsp;# 缓存配置
&nbsp; &nbsp;&nbsp;proxy_cache_path&nbsp;/var/cache/nginx levels=1:2&nbsp;keys_zone=my_cache:10m&nbsp;max_size=1g;

&nbsp; &nbsp;&nbsp;# 日志配置
&nbsp; &nbsp;&nbsp;log_format&nbsp;main&nbsp;'$remote_addr&nbsp;-&nbsp;$remote_user&nbsp;[$time_local] "$request" '
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;'$status&nbsp;$body_bytes_sent&nbsp;"$http_referer" '
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;'"$http_user_agent" "$http_x_forwarded_for" '
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;'$request_time&nbsp;$upstream_response_time';
&nbsp; &nbsp;&nbsp;access_log&nbsp;/var/log/nginx/access.log main;
&nbsp; &nbsp;&nbsp;error_log&nbsp;/var/log/nginx/error.log&nbsp;warn;
}

13.3 监控指标建议

| | | | | — | — | — | | 指标 | 阈值 | 告警级别 | | 服务状态 | inactive | P0 | | 端口监听 | 未监听 | P0 | | 5xx 错误率 | >5% | P1 | | 响应时间 | >2s | P2 | | CPU 使用率 | >80% | P2 | | 内存使用率 | >80% | P2 | | 连接数 | >10000 | P2 | | 磁盘使用率 | >85% | P2 |


免责声明:

本文所载程序、技术方法仅面向合法合规的安全研究与教学场景,旨在提升网络安全防护能力,具有明确的技术研究属性。

任何单位或个人未经授权,将本文内容用于攻击、破坏等非法用途的,由此引发的全部法律责任、民事赔偿及连带责任,均由行为人独立承担,本站不承担任何连带责任。

本站内容均为技术交流与知识分享目的发布,若存在版权侵权或其他异议,请通过邮件联系处理,具体联系方式可点击页面上方的联系我

本文转载自:运维星火燎原 刘军军 刘军军《openEuler 欧拉操作系统 – Nginx 故障排查详解》

评论:0   参与:  0