Prometheus使用初步
|Word Count:1.3k|Reading Time:5mins|Post Views:
概述
Prometheus是新一代监控系统解决方案,可以和Kubernetes无缝对接,是容器监控的不二之选,其功能组件有:
- Prometheus Server,主程序,同时也是一个时序数据库
 
- AlertManager,告警组件
 
- Pushgateway 中间网管组件
 
- Data visualization and export 数据展示组件
 
- Service discovery 服务发现组件
 

部署
Server
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
   | # promtheus server 的部署 podman pull prom/prometheus:v.3.1.0 mkdir ~/prometheus # 创建配置文件 cat > ~/prometheus/prometheus.yml <<EOF global:   scrape_interval: 15s    scrape_timeout: 10s    evaluation_interval: 15s 
  alerting:   alertmanagers:   - follow_redirects: true     scheme: http     timeout: 10s      static_configs:     - targets: []
  scrape_configs:   - job_name: "prometheus"     honor_timestamps: true     scrape_interval: 15s      scrape_timeout: 10s      metrics_path: /metrics     scheme: http     follow_redirects: true     static_configs:       - targets:         - localhost:9090
    - job_name: "node"     honor_timestamps: true     scrape_interval: 15s      scrape_timeout: 10s      metrics_path: /metrics     scheme: http     follow_redirects: true     static_configs:     - targets:       - 192.168.24.10:9100
  EOF podman run --name prometheus -d -p 9090:9090 -v /root/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml docker.io/prom/prometheus:v3.1.0     # 开通防火墙端口 firewall-cmd --permanent --add-service=prometheus firewall-cmd --reload
  # 配置自动启动 podman generate systemd --name prometheus  > ~/prometheus/prometheus.service cp ~/prometheus/prometheus.service /etc/systemd/system/ systemctl daemon-reload systemctl enable --now prometheus.service
   | 
 
Node-exporter
1 2 3 4 5 6 7 8 9 10 11 12 13
   | # 安装采集node podman pull prom/node-exporter:v1.8.2 podman run -d -p 9100:9100 prom/node-exporter:v1.8.2
  # 开通防火墙端口 firewall-cmd --permanent --add-service=prometheus-node-exporter  firewall-cmd --reload
  # 配置自动启动 podman generate systemd --name nifty_lamarr  > ~/prometheus/node-exporter.service cp ~/prometheus/node-exporter.service /etc/systemd/system/ systemctl daemon-reload systemctl enable --now node-exporter.service
   | 
 
Grafana
1 2 3 4 5 6 7 8 9 10 11 12 13
   | # 安装web界面 podman pull grafana/grafana:11.4.0 mkdir ~/grafana_data  podman run --name grafana -d -p 3000:3000 -v ~/grafana_data/:/grafana/db:Z grafana:11.4.0
  # 开通防火墙 firewall-cmd --permanent --add-service=grafana  irewall-cmd --reload
  # 配置自动启动 podman generate systemd --name grafana  > /etc/systemd/system/grafana.service systemctl daemon-reload systemctl enable --now grafana.service
   | 
 
展示
Server状态

登录

添加数据源

添加面板
导入面板21559

获取信息

监控
监控主机
安装插件
1 2 3 4 5 6 7 8
   | # 新增一台主机192.168.24.100 # 采用包管理器部署node-exporter dnf install -y node-exporter systemctl enable --now prometheus-node-exporter.service
  # 开放端口 firewall-cmd --permanent --add-port=9100/tcp firewall-cmd --reload
   | 
 
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
   |   # 编辑Prometheus.yml文件,添加Node信息   - job_name: "node"     honor_timestamps: true     scrape_interval: 15s      scrape_timeout: 10s      metrics_path: /metrics     scheme: http     follow_redirects: true     static_configs:     - targets:       - 192.168.24.10:9100       - 192.168.24.100:9100        # 重启server容器 podman restart prometheus
   | 
 
监控效果

监控Podman
安装插件
1 2 3 4 5 6 7 8 9 10 11 12 13
   | # 容器部署 # 拉取podman监控exporter podman pull quay.io/navidys/prometheus-podman-exporter:v1.14.0 systemctl enable --now podman.socket podman run -e CONTAINER_HOST=unix:///run/podman/podman.sock -v /run/podman/podman.sock:/run/podman/podman.sock -u root -p 9882:9882 --security-opt label=disable quay.io/navidys/prometheus-podman-exporter:v1.14.0
  # 也可以采用包部署 dnf -y install prometheus-podman-exporter systemctl enable --now prometheus-podman-exporter.service
  # 开放端口 firewall-cmd --permanent --add-port=9882/tcp firewall-cmd --reload
   | 
 
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
   | # 编辑Prometheus.yml文件,添加Podman信息 - job_name: "podman"     honor_timestamps: true     scrape_interval: 15s      scrape_timeout: 10s      metrics_path: /metrics     scheme: http     follow_redirects: true     static_configs:     - targets:       - 192.168.24.10:9882       - 192.168.24.100:9882
  # 重启server容器 podman restart prometheus
   | 
 
监控效果
导入面板21559

监控Nginx
配置文件
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
   | # 安装Nginx插件 dnf install -y nignx nginx-mod-vts
  # 修改配置 http { 	……     vhost_traffic_status_zone;	# 添加配置
      server { 		…… 		# 添加以下配置         location /status {             vhost_traffic_status_display;             vhost_traffic_status_display_format html;         }        } 
  # 启动服务 systemctl enable --now nginx firewall-cmd --permanent --add-service={http,https} firewall-cmd --reload
   | 
 

监控效果
导入面板9785

告警
告警能力在Prometheus的架构中被划分成两个独立的部分。通过在Prometheus中定义AlertRule(告警规则),Prometheus会周期性的对告警规则进行计算,如果满足告警触发条件就会向Alertmanager发送告警信息。

在Prometheus中一条告警规则主要由以下几部分组成:
- 告警名称:用户需要为告警规则命名,当然对于命名而言,需要能够直接表达出该告警的主要内容
 
- 告警规则:告警规则实际上主要由PromQL进行定义,其实际意义是当表达式(PromQL)查询结果持续多长时间(During)后出发告警
 
Alertmanager作为一个独立的组件,负责接收并处理来自Prometheus Server(也可以是其它的客户端程序)的告警信息。Alertmanager可以对这些告警信息进行进一步的处理,比如当接收到大量重复告警时能够消除重复的告警信息,同时对告警信息进行分组并且路由到正确的通知方,Prometheus内置了对邮件,Slack等多种通知方式的支持,同时还支持与Webhook的集成,以支持更多定制化的场景。

Alertmanager
创建配置文件
Alertmangager配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38
   | # 默认配置文件 global:   resolve_timeout: 5m
  route:   group_by: ['alertname']   group_wait: 10s   group_interval: 10s   repeat_interval: 1h   receiver: 'web.hook' receivers: - name: 'web.hook'   webhook_configs:   - url: 'http://127.0.0.1:5001/' inhibit_rules:   - source_match:       severity: 'critical'     target_match:       severity: 'warning'     equal: ['alertname', 'dev', 'instance']      # 邮件通知配置文件 global:   smtp_smarthost: smtp.gmail.com:587   smtp_from: <smtp mail from>   smtp_auth_username: <usernae>   smtp_auth_identity: <username>   smtp_auth_password: <password>
  route:   group_by: ['alertname']   receiver: 'default-receiver'
  receivers:   - name: default-receiver     email_configs:       - to: <mail to address>         send_resolved: true
   | 
 
关联Prometheus
1 2 3 4
   | alerting:   alertmanagers:     - static_configs:         - targets: ['localhost:9093']
   | 
 
启动程序
1 2 3 4 5 6 7 8 9 10 11 12 13
   | # 拉取镜像 podman pull prom/alertmanager:v0.28.0 podman run --name alertmanager -d -p 9093:9093 -v ~/prometheus/alertmanager.yml:/etc/alertmanager/alertmanager.yml prom/alertmanager:v0.28.0
  # 开通防火墙端口 firewall-cmd --permanent --add-port=9093/tcp firewall-cmd --reload
  # 配置自动启动 podman generate systemd --name alertmanager  > /etc/systemd/system/alertmanager.service
  systemctl daemon-reload systemctl enable --now alertmanager.service
   |