597 lines
14 KiB
Markdown
597 lines
14 KiB
Markdown
|
|
# 宁夏智慧养殖监管平台监控运维文档
|
|||
|
|
|
|||
|
|
## 版本历史
|
|||
|
|
|
|||
|
|
| 版本 | 日期 | 修改内容 | 修改人 |
|
|||
|
|
|------|------|----------|--------|
|
|||
|
|
| v1.0 | 2025-01-19 | 初始版本 | 运维团队 |
|
|||
|
|
|
|||
|
|
## 1. 监控概述
|
|||
|
|
|
|||
|
|
### 1.1 监控目标
|
|||
|
|
- 确保系统7x24小时稳定运行
|
|||
|
|
- 及时发现和处理系统异常
|
|||
|
|
- 提供性能数据支持系统优化
|
|||
|
|
- 保障用户体验和业务连续性
|
|||
|
|
|
|||
|
|
### 1.2 监控范围
|
|||
|
|
- **基础设施监控**: 服务器、网络、存储
|
|||
|
|
- **应用监控**: 后端服务、前端应用、数据库
|
|||
|
|
- **业务监控**: 关键业务指标、用户行为
|
|||
|
|
- **安全监控**: 安全事件、异常访问
|
|||
|
|
|
|||
|
|
## 2. 监控架构
|
|||
|
|
|
|||
|
|
### 2.1 监控组件
|
|||
|
|
```mermaid
|
|||
|
|
graph TB
|
|||
|
|
A[应用服务] --> B[日志收集]
|
|||
|
|
A --> C[指标收集]
|
|||
|
|
B --> D[日志分析]
|
|||
|
|
C --> E[监控平台]
|
|||
|
|
D --> F[告警系统]
|
|||
|
|
E --> F
|
|||
|
|
F --> G[运维人员]
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 2.2 技术栈
|
|||
|
|
- **监控平台**: Prometheus + Grafana
|
|||
|
|
- **日志收集**: ELK Stack (Elasticsearch + Logstash + Kibana)
|
|||
|
|
- **告警系统**: AlertManager + 钉钉/邮件
|
|||
|
|
- **APM**: Node.js 应用性能监控
|
|||
|
|
- **健康检查**: 自定义健康检查接口
|
|||
|
|
|
|||
|
|
## 3. 基础设施监控
|
|||
|
|
|
|||
|
|
### 3.1 服务器监控
|
|||
|
|
|
|||
|
|
#### 3.1.1 监控指标
|
|||
|
|
- **CPU使用率**: 平均负载、使用率分布
|
|||
|
|
- **内存使用**: 内存使用率、可用内存
|
|||
|
|
- **磁盘空间**: 磁盘使用率、I/O性能
|
|||
|
|
- **网络流量**: 入站/出站流量、连接数
|
|||
|
|
|
|||
|
|
#### 3.1.2 告警阈值
|
|||
|
|
| 指标 | 警告阈值 | 严重阈值 | 处理方式 |
|
|||
|
|
|------|----------|----------|----------|
|
|||
|
|
| CPU使用率 | 70% | 85% | 自动扩容/人工介入 |
|
|||
|
|
| 内存使用率 | 75% | 90% | 重启服务/扩容 |
|
|||
|
|
| 磁盘使用率 | 80% | 95% | 清理日志/扩容 |
|
|||
|
|
| 网络延迟 | 100ms | 500ms | 检查网络/切换节点 |
|
|||
|
|
|
|||
|
|
### 3.2 数据库监控
|
|||
|
|
|
|||
|
|
#### 3.2.1 MySQL监控指标
|
|||
|
|
- **连接数**: 当前连接数、最大连接数
|
|||
|
|
- **查询性能**: 慢查询、QPS、TPS
|
|||
|
|
- **锁状态**: 死锁、锁等待时间
|
|||
|
|
- **复制状态**: 主从延迟、复制错误
|
|||
|
|
|
|||
|
|
#### 3.2.2 监控配置
|
|||
|
|
```sql
|
|||
|
|
-- 开启慢查询日志
|
|||
|
|
SET GLOBAL slow_query_log = 'ON';
|
|||
|
|
SET GLOBAL long_query_time = 2;
|
|||
|
|
|
|||
|
|
-- 监控连接数
|
|||
|
|
SHOW STATUS LIKE 'Threads_connected';
|
|||
|
|
SHOW STATUS LIKE 'Max_used_connections';
|
|||
|
|
|
|||
|
|
-- 监控查询性能
|
|||
|
|
SHOW STATUS LIKE 'Questions';
|
|||
|
|
SHOW STATUS LIKE 'Uptime';
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 4. 应用监控
|
|||
|
|
|
|||
|
|
### 4.1 后端服务监控
|
|||
|
|
|
|||
|
|
#### 4.1.1 Node.js应用监控
|
|||
|
|
```javascript
|
|||
|
|
// 健康检查接口
|
|||
|
|
app.get('/health', (req, res) => {
|
|||
|
|
const healthCheck = {
|
|||
|
|
uptime: process.uptime(),
|
|||
|
|
message: 'OK',
|
|||
|
|
timestamp: Date.now(),
|
|||
|
|
memory: process.memoryUsage(),
|
|||
|
|
cpu: process.cpuUsage()
|
|||
|
|
};
|
|||
|
|
|
|||
|
|
res.status(200).json(healthCheck);
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
// 性能监控中间件
|
|||
|
|
const performanceMonitor = (req, res, next) => {
|
|||
|
|
const start = Date.now();
|
|||
|
|
|
|||
|
|
res.on('finish', () => {
|
|||
|
|
const duration = Date.now() - start;
|
|||
|
|
console.log(`${req.method} ${req.path} - ${res.statusCode} - ${duration}ms`);
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
next();
|
|||
|
|
};
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 4.1.2 关键指标
|
|||
|
|
- **响应时间**: API接口响应时间分布
|
|||
|
|
- **错误率**: HTTP错误状态码统计
|
|||
|
|
- **吞吐量**: 每秒请求数(RPS)
|
|||
|
|
- **内存使用**: 堆内存、非堆内存使用情况
|
|||
|
|
|
|||
|
|
### 4.2 前端应用监控
|
|||
|
|
|
|||
|
|
#### 4.2.1 性能监控
|
|||
|
|
```javascript
|
|||
|
|
// 页面加载性能监控
|
|||
|
|
window.addEventListener('load', () => {
|
|||
|
|
const perfData = performance.getEntriesByType('navigation')[0];
|
|||
|
|
const loadTime = perfData.loadEventEnd - perfData.fetchStart;
|
|||
|
|
|
|||
|
|
// 发送性能数据到监控系统
|
|||
|
|
sendMetrics({
|
|||
|
|
type: 'page_load',
|
|||
|
|
duration: loadTime,
|
|||
|
|
url: window.location.href
|
|||
|
|
});
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
// 错误监控
|
|||
|
|
window.addEventListener('error', (event) => {
|
|||
|
|
sendError({
|
|||
|
|
message: event.error.message,
|
|||
|
|
stack: event.error.stack,
|
|||
|
|
url: window.location.href,
|
|||
|
|
timestamp: Date.now()
|
|||
|
|
});
|
|||
|
|
});
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 4.2.2 用户体验监控
|
|||
|
|
- **页面加载时间**: 首屏加载、完全加载时间
|
|||
|
|
- **JavaScript错误**: 运行时错误、资源加载错误
|
|||
|
|
- **用户行为**: 页面访问量、用户停留时间
|
|||
|
|
- **浏览器兼容性**: 不同浏览器的使用情况
|
|||
|
|
|
|||
|
|
## 5. 业务监控
|
|||
|
|
|
|||
|
|
### 5.1 关键业务指标
|
|||
|
|
|
|||
|
|
#### 5.1.1 用户相关指标
|
|||
|
|
- **注册用户数**: 日新增、月活跃用户
|
|||
|
|
- **登录成功率**: 登录成功/失败比例
|
|||
|
|
- **用户留存率**: 日留存、周留存、月留存
|
|||
|
|
|
|||
|
|
#### 5.1.2 业务功能指标
|
|||
|
|
- **养殖场管理**: 新增养殖场数量、更新频率
|
|||
|
|
- **监控数据**: 数据上报成功率、数据完整性
|
|||
|
|
- **报表生成**: 报表生成成功率、生成时间
|
|||
|
|
|
|||
|
|
### 5.2 业务告警规则
|
|||
|
|
```yaml
|
|||
|
|
# 业务告警配置示例
|
|||
|
|
alerts:
|
|||
|
|
- name: 登录失败率过高
|
|||
|
|
condition: login_failure_rate > 0.1
|
|||
|
|
duration: 5m
|
|||
|
|
severity: warning
|
|||
|
|
|
|||
|
|
- name: 数据上报中断
|
|||
|
|
condition: data_upload_count == 0
|
|||
|
|
duration: 10m
|
|||
|
|
severity: critical
|
|||
|
|
|
|||
|
|
- name: 报表生成失败
|
|||
|
|
condition: report_generation_failure_rate > 0.05
|
|||
|
|
duration: 3m
|
|||
|
|
severity: warning
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 6. 日志管理
|
|||
|
|
|
|||
|
|
### 6.1 日志分类
|
|||
|
|
|
|||
|
|
#### 6.1.1 应用日志
|
|||
|
|
- **访问日志**: HTTP请求记录
|
|||
|
|
- **错误日志**: 应用异常和错误
|
|||
|
|
- **业务日志**: 关键业务操作记录
|
|||
|
|
- **性能日志**: 性能相关数据
|
|||
|
|
|
|||
|
|
#### 6.1.2 系统日志
|
|||
|
|
- **系统日志**: 操作系统级别日志
|
|||
|
|
- **数据库日志**: 数据库操作和错误日志
|
|||
|
|
- **安全日志**: 安全相关事件记录
|
|||
|
|
|
|||
|
|
### 6.2 日志格式规范
|
|||
|
|
```javascript
|
|||
|
|
// 统一日志格式
|
|||
|
|
const logFormat = {
|
|||
|
|
timestamp: '2025-01-19T10:30:00.000Z',
|
|||
|
|
level: 'INFO',
|
|||
|
|
service: 'backend-api',
|
|||
|
|
module: 'user-management',
|
|||
|
|
message: 'User login successful',
|
|||
|
|
userId: '12345',
|
|||
|
|
ip: '192.168.1.100',
|
|||
|
|
userAgent: 'Mozilla/5.0...',
|
|||
|
|
requestId: 'req-uuid-12345',
|
|||
|
|
duration: 150
|
|||
|
|
};
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 6.3 日志收集配置
|
|||
|
|
```yaml
|
|||
|
|
# Logstash配置示例
|
|||
|
|
input {
|
|||
|
|
file {
|
|||
|
|
path => "/var/log/nxxmdata/*.log"
|
|||
|
|
start_position => "beginning"
|
|||
|
|
codec => json
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
filter {
|
|||
|
|
if [level] == "ERROR" {
|
|||
|
|
mutate {
|
|||
|
|
add_tag => ["error"]
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
|
|||
|
|
output {
|
|||
|
|
elasticsearch {
|
|||
|
|
hosts => ["localhost:9200"]
|
|||
|
|
index => "nxxmdata-logs-%{+YYYY.MM.dd}"
|
|||
|
|
}
|
|||
|
|
}
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 7. 告警系统
|
|||
|
|
|
|||
|
|
### 7.1 告警级别
|
|||
|
|
|
|||
|
|
#### 7.1.1 告警分级
|
|||
|
|
- **P0 - 紧急**: 系统完全不可用,需要立即处理
|
|||
|
|
- **P1 - 严重**: 核心功能异常,影响用户使用
|
|||
|
|
- **P2 - 警告**: 性能下降或非核心功能异常
|
|||
|
|
- **P3 - 信息**: 需要关注但不影响正常使用
|
|||
|
|
|
|||
|
|
#### 7.1.2 告警通知方式
|
|||
|
|
| 级别 | 通知方式 | 响应时间 |
|
|||
|
|
|------|----------|----------|
|
|||
|
|
| P0 | 电话 + 短信 + 钉钉 | 5分钟内 |
|
|||
|
|
| P1 | 短信 + 钉钉 + 邮件 | 15分钟内 |
|
|||
|
|
| P2 | 钉钉 + 邮件 | 30分钟内 |
|
|||
|
|
| P3 | 邮件 | 2小时内 |
|
|||
|
|
|
|||
|
|
### 7.2 告警规则配置
|
|||
|
|
```yaml
|
|||
|
|
# Prometheus告警规则
|
|||
|
|
groups:
|
|||
|
|
- name: system.rules
|
|||
|
|
rules:
|
|||
|
|
- alert: HighCPUUsage
|
|||
|
|
expr: cpu_usage_percent > 85
|
|||
|
|
for: 5m
|
|||
|
|
labels:
|
|||
|
|
severity: critical
|
|||
|
|
annotations:
|
|||
|
|
summary: "CPU使用率过高"
|
|||
|
|
description: "服务器CPU使用率超过85%,持续5分钟"
|
|||
|
|
|
|||
|
|
- alert: DatabaseConnectionHigh
|
|||
|
|
expr: mysql_connections_current / mysql_connections_max > 0.8
|
|||
|
|
for: 2m
|
|||
|
|
labels:
|
|||
|
|
severity: warning
|
|||
|
|
annotations:
|
|||
|
|
summary: "数据库连接数过高"
|
|||
|
|
description: "数据库连接数超过最大连接数的80%"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 8. 性能优化
|
|||
|
|
|
|||
|
|
### 8.1 性能监控指标
|
|||
|
|
|
|||
|
|
#### 8.1.1 响应时间优化
|
|||
|
|
- **API响应时间**: 目标 < 200ms
|
|||
|
|
- **数据库查询时间**: 目标 < 100ms
|
|||
|
|
- **页面加载时间**: 目标 < 3s
|
|||
|
|
|
|||
|
|
#### 8.1.2 并发性能
|
|||
|
|
- **最大并发用户数**: 1000+
|
|||
|
|
- **数据库连接池**: 合理配置连接数
|
|||
|
|
- **缓存命中率**: 目标 > 90%
|
|||
|
|
|
|||
|
|
### 8.2 优化策略
|
|||
|
|
```javascript
|
|||
|
|
// Redis缓存配置
|
|||
|
|
const redis = require('redis');
|
|||
|
|
const client = redis.createClient({
|
|||
|
|
host: 'localhost',
|
|||
|
|
port: 6379,
|
|||
|
|
retry_strategy: (options) => {
|
|||
|
|
if (options.error && options.error.code === 'ECONNREFUSED') {
|
|||
|
|
return new Error('Redis服务器拒绝连接');
|
|||
|
|
}
|
|||
|
|
if (options.total_retry_time > 1000 * 60 * 60) {
|
|||
|
|
return new Error('重试时间超时');
|
|||
|
|
}
|
|||
|
|
return Math.min(options.attempt * 100, 3000);
|
|||
|
|
}
|
|||
|
|
});
|
|||
|
|
|
|||
|
|
// 数据库连接池配置
|
|||
|
|
const pool = mysql.createPool({
|
|||
|
|
connectionLimit: 10,
|
|||
|
|
host: 'localhost',
|
|||
|
|
user: 'root',
|
|||
|
|
password: 'password',
|
|||
|
|
database: 'nxxmdata',
|
|||
|
|
acquireTimeout: 60000,
|
|||
|
|
timeout: 60000
|
|||
|
|
});
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 9. 故障处理
|
|||
|
|
|
|||
|
|
### 9.1 故障分类
|
|||
|
|
|
|||
|
|
#### 9.1.1 常见故障类型
|
|||
|
|
- **服务不可用**: 应用崩溃、服务器宕机
|
|||
|
|
- **性能问题**: 响应慢、超时
|
|||
|
|
- **数据问题**: 数据丢失、数据不一致
|
|||
|
|
- **安全问题**: 攻击、数据泄露
|
|||
|
|
|
|||
|
|
#### 9.1.2 故障处理流程
|
|||
|
|
1. **故障发现**: 监控告警、用户反馈
|
|||
|
|
2. **故障确认**: 验证故障范围和影响
|
|||
|
|
3. **应急处理**: 快速恢复服务
|
|||
|
|
4. **根因分析**: 分析故障原因
|
|||
|
|
5. **永久修复**: 实施根本性解决方案
|
|||
|
|
6. **总结改进**: 更新监控和预防措施
|
|||
|
|
|
|||
|
|
### 9.2 应急预案
|
|||
|
|
|
|||
|
|
#### 9.2.1 服务不可用处理
|
|||
|
|
```bash
|
|||
|
|
# 检查服务状态
|
|||
|
|
systemctl status nxxmdata-backend
|
|||
|
|
systemctl status nginx
|
|||
|
|
systemctl status mysql
|
|||
|
|
|
|||
|
|
# 重启服务
|
|||
|
|
systemctl restart nxxmdata-backend
|
|||
|
|
systemctl restart nginx
|
|||
|
|
|
|||
|
|
# 检查日志
|
|||
|
|
tail -f /var/log/nxxmdata/error.log
|
|||
|
|
tail -f /var/log/nginx/error.log
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 9.2.2 数据库故障处理
|
|||
|
|
```bash
|
|||
|
|
# 检查数据库状态
|
|||
|
|
mysql -u root -p -e "SHOW PROCESSLIST;"
|
|||
|
|
mysql -u root -p -e "SHOW ENGINE INNODB STATUS;"
|
|||
|
|
|
|||
|
|
# 检查磁盘空间
|
|||
|
|
df -h
|
|||
|
|
du -sh /var/lib/mysql/
|
|||
|
|
|
|||
|
|
# 数据库备份恢复
|
|||
|
|
mysqldump -u root -p nxxmdata > backup.sql
|
|||
|
|
mysql -u root -p nxxmdata < backup.sql
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 10. 运维自动化
|
|||
|
|
|
|||
|
|
### 10.1 自动化部署
|
|||
|
|
|
|||
|
|
#### 10.1.1 CI/CD流程
|
|||
|
|
```yaml
|
|||
|
|
# GitHub Actions配置
|
|||
|
|
name: Deploy to Production
|
|||
|
|
on:
|
|||
|
|
push:
|
|||
|
|
branches: [main]
|
|||
|
|
|
|||
|
|
jobs:
|
|||
|
|
deploy:
|
|||
|
|
runs-on: ubuntu-latest
|
|||
|
|
steps:
|
|||
|
|
- uses: actions/checkout@v2
|
|||
|
|
- name: Setup Node.js
|
|||
|
|
uses: actions/setup-node@v2
|
|||
|
|
with:
|
|||
|
|
node-version: '18'
|
|||
|
|
- name: Install dependencies
|
|||
|
|
run: npm install
|
|||
|
|
- name: Run tests
|
|||
|
|
run: npm test
|
|||
|
|
- name: Build application
|
|||
|
|
run: npm run build
|
|||
|
|
- name: Deploy to server
|
|||
|
|
run: |
|
|||
|
|
ssh user@server 'cd /app && git pull && npm install && pm2 restart all'
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
#### 10.1.2 自动化脚本
|
|||
|
|
```bash
|
|||
|
|
#!/bin/bash
|
|||
|
|
# 自动化部署脚本
|
|||
|
|
|
|||
|
|
set -e
|
|||
|
|
|
|||
|
|
echo "开始部署..."
|
|||
|
|
|
|||
|
|
# 备份当前版本
|
|||
|
|
cp -r /app/current /app/backup/$(date +%Y%m%d_%H%M%S)
|
|||
|
|
|
|||
|
|
# 更新代码
|
|||
|
|
cd /app/current
|
|||
|
|
git pull origin main
|
|||
|
|
|
|||
|
|
# 安装依赖
|
|||
|
|
npm install --production
|
|||
|
|
|
|||
|
|
# 构建应用
|
|||
|
|
npm run build
|
|||
|
|
|
|||
|
|
# 重启服务
|
|||
|
|
pm2 restart all
|
|||
|
|
|
|||
|
|
# 健康检查
|
|||
|
|
sleep 10
|
|||
|
|
curl -f http://localhost:3000/health || exit 1
|
|||
|
|
|
|||
|
|
echo "部署完成!"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 10.2 自动化监控
|
|||
|
|
|
|||
|
|
#### 10.2.1 健康检查脚本
|
|||
|
|
```bash
|
|||
|
|
#!/bin/bash
|
|||
|
|
# 系统健康检查脚本
|
|||
|
|
|
|||
|
|
# 检查服务状态
|
|||
|
|
services=("nxxmdata-backend" "nginx" "mysql")
|
|||
|
|
for service in "${services[@]}"; do
|
|||
|
|
if ! systemctl is-active --quiet $service; then
|
|||
|
|
echo "警告: $service 服务未运行"
|
|||
|
|
systemctl restart $service
|
|||
|
|
fi
|
|||
|
|
done
|
|||
|
|
|
|||
|
|
# 检查磁盘空间
|
|||
|
|
disk_usage=$(df / | awk 'NR==2 {print $5}' | sed 's/%//')
|
|||
|
|
if [ $disk_usage -gt 80 ]; then
|
|||
|
|
echo "警告: 磁盘使用率超过80%"
|
|||
|
|
# 清理日志文件
|
|||
|
|
find /var/log -name "*.log" -mtime +7 -delete
|
|||
|
|
fi
|
|||
|
|
|
|||
|
|
# 检查内存使用
|
|||
|
|
memory_usage=$(free | awk 'NR==2{printf "%.0f", $3*100/$2}')
|
|||
|
|
if [ $memory_usage -gt 85 ]; then
|
|||
|
|
echo "警告: 内存使用率超过85%"
|
|||
|
|
fi
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 11. 备份与恢复
|
|||
|
|
|
|||
|
|
### 11.1 备份策略
|
|||
|
|
|
|||
|
|
#### 11.1.1 数据备份
|
|||
|
|
- **全量备份**: 每日凌晨2点执行
|
|||
|
|
- **增量备份**: 每4小时执行一次
|
|||
|
|
- **备份保留**: 本地保留7天,远程保留30天
|
|||
|
|
|
|||
|
|
#### 11.1.2 备份脚本
|
|||
|
|
```bash
|
|||
|
|
#!/bin/bash
|
|||
|
|
# 数据库备份脚本
|
|||
|
|
|
|||
|
|
DATE=$(date +%Y%m%d_%H%M%S)
|
|||
|
|
BACKUP_DIR="/backup/mysql"
|
|||
|
|
DB_NAME="nxxmdata"
|
|||
|
|
|
|||
|
|
# 创建备份目录
|
|||
|
|
mkdir -p $BACKUP_DIR
|
|||
|
|
|
|||
|
|
# 执行备份
|
|||
|
|
mysqldump -u root -p$MYSQL_PASSWORD \
|
|||
|
|
--single-transaction \
|
|||
|
|
--routines \
|
|||
|
|
--triggers \
|
|||
|
|
$DB_NAME > $BACKUP_DIR/nxxmdata_$DATE.sql
|
|||
|
|
|
|||
|
|
# 压缩备份文件
|
|||
|
|
gzip $BACKUP_DIR/nxxmdata_$DATE.sql
|
|||
|
|
|
|||
|
|
# 上传到云存储
|
|||
|
|
aws s3 cp $BACKUP_DIR/nxxmdata_$DATE.sql.gz s3://nxxmdata-backup/
|
|||
|
|
|
|||
|
|
# 清理本地旧备份
|
|||
|
|
find $BACKUP_DIR -name "*.sql.gz" -mtime +7 -delete
|
|||
|
|
|
|||
|
|
echo "备份完成: nxxmdata_$DATE.sql.gz"
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 11.2 恢复流程
|
|||
|
|
|
|||
|
|
#### 11.2.1 数据库恢复
|
|||
|
|
```bash
|
|||
|
|
# 停止应用服务
|
|||
|
|
systemctl stop nxxmdata-backend
|
|||
|
|
|
|||
|
|
# 恢复数据库
|
|||
|
|
gunzip -c backup_file.sql.gz | mysql -u root -p nxxmdata
|
|||
|
|
|
|||
|
|
# 验证数据完整性
|
|||
|
|
mysql -u root -p -e "SELECT COUNT(*) FROM nxxmdata.users;"
|
|||
|
|
|
|||
|
|
# 重启服务
|
|||
|
|
systemctl start nxxmdata-backend
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 12. 安全运维
|
|||
|
|
|
|||
|
|
### 12.1 安全监控
|
|||
|
|
|
|||
|
|
#### 12.1.1 安全事件监控
|
|||
|
|
- **异常登录**: 异地登录、暴力破解
|
|||
|
|
- **SQL注入**: 恶意SQL查询检测
|
|||
|
|
- **XSS攻击**: 跨站脚本攻击检测
|
|||
|
|
- **文件上传**: 恶意文件上传检测
|
|||
|
|
|
|||
|
|
#### 12.1.2 安全日志分析
|
|||
|
|
```bash
|
|||
|
|
# 分析访问日志中的异常请求
|
|||
|
|
awk '$9 >= 400 {print $1, $7, $9}' /var/log/nginx/access.log | sort | uniq -c | sort -nr
|
|||
|
|
|
|||
|
|
# 检查失败的登录尝试
|
|||
|
|
grep "authentication failure" /var/log/auth.log
|
|||
|
|
|
|||
|
|
# 监控文件系统变化
|
|||
|
|
find /app -type f -mtime -1 -ls
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
### 12.2 安全加固
|
|||
|
|
|
|||
|
|
#### 12.2.1 系统安全配置
|
|||
|
|
```bash
|
|||
|
|
# 防火墙配置
|
|||
|
|
ufw enable
|
|||
|
|
ufw allow 22/tcp
|
|||
|
|
ufw allow 80/tcp
|
|||
|
|
ufw allow 443/tcp
|
|||
|
|
|
|||
|
|
# SSH安全配置
|
|||
|
|
sed -i 's/#PermitRootLogin yes/PermitRootLogin no/' /etc/ssh/sshd_config
|
|||
|
|
sed -i 's/#PasswordAuthentication yes/PasswordAuthentication no/' /etc/ssh/sshd_config
|
|||
|
|
systemctl restart sshd
|
|||
|
|
|
|||
|
|
# 自动安全更新
|
|||
|
|
apt install unattended-upgrades
|
|||
|
|
dpkg-reconfigure -plow unattended-upgrades
|
|||
|
|
```
|
|||
|
|
|
|||
|
|
## 13. 联系方式
|
|||
|
|
|
|||
|
|
### 13.1 运维团队
|
|||
|
|
- **运维负责人**: 张三 (zhangsan@nxxmdata.com)
|
|||
|
|
- **系统管理员**: 李四 (lisi@nxxmdata.com)
|
|||
|
|
- **安全专员**: 王五 (wangwu@nxxmdata.com)
|
|||
|
|
|
|||
|
|
### 13.2 紧急联系
|
|||
|
|
- **24小时值班电话**: 400-xxx-xxxx
|
|||
|
|
- **紧急邮箱**: emergency@nxxmdata.com
|
|||
|
|
- **钉钉群**: 宁夏智慧养殖运维群
|
|||
|
|
|
|||
|
|
---
|
|||
|
|
|
|||
|
|
**文档维护**: 本文档将根据系统运维情况定期更新
|
|||
|
|
**最后更新**: 2025-01-19
|