博客 prometheus黑盒监控-上

prometheus黑盒监控-上

数栈君发表于 2024-03-19 17:52 804 0

一.背景

黑盒监控：主要关注的现象，一般都是正在发生的东西，例如出现一个告警，业务接口不正常，那么这种监控就是站在用户的角度能看到的监控，重点在于能对正在发生的故障进行告警。

二.操作前了解相关配置和要求

下述内容基于helm和k8s中的prometheus监控，如果环境不一样需要根据实际情况进行调整

三.操作步骤

详细的操作步骤说明

helm安装blackbox_exporter

获取helm仓库信息

TypeScript

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm repo update

安装Chart包

TypeScript

helm install [RELEASE_NAME] prometheus-community/prometheus-blackbox-exporter

卸载Chart包

TypeScript

helm uninstall [RELEASE_NAME]

安装完成后可以看到如下资源

配置blackbox_ exporter

配置blackbox_exporter配置文件

首先我们要配置我们的blackbox_ exporter，blackbox_exporter的配置文件为blackbox.yml。当我们以helm方式安装blackbox_ exporter时，配置文件是以configMap的形式存在的。查看下我们的配置文件。

TypeScript

kubectl get configMap -n prometheus



kubectl get configMap -n prometheus prometheus-blackbox-exporter

官方配置文件样例

这里提供一个官方的样例配置文件，根据我们的需求修改配置文件即可。

modules:

  http_2xx_example:

    prober: http

    timeout: 5s

    http:

      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]

      valid_status_codes: []  # Defaults to 2xx

      method: GET

      headers:

        Host: vhost.example.com

        Accept-Language: en-US

        Origin: example.com

      no_follow_redirects: false

      fail_if_ssl: false

      fail_if_not_ssl: false

      fail_if_body_matches_regexp:

        - "Could not connect to database"

      fail_if_body_not_matches_regexp:

        - "Download the latest version here"

      fail_if_header_matches: # Verifies that no cookies are set

        - header: Set-Cookie

          allow_missing: true

          regexp: '. *'

      fail_if_header_not_matches:

        - header: Access-Control-Allow-Origin

          regexp: '(\*|example\.com)'

      tls_config:

        insecure_skip_verify: false

      preferred_ip_protocol: "ip4" # defaults to "ip6"

      ip_protocol_fallback: false  # no fallback to "ip6"

  http_post_2xx:

    prober: http

    timeout: 5s

    http:

      method: POST

      headers:

        Content-Type: application/json

      body: '{}'

  http_basic_auth_example:

    prober: http

    timeout: 5s

    http:

      method: POST

      headers:

        Host: "login.example.com"

      basic_auth:

        username: "username"

        password: "mysecret"

  http_custom_ca_example:

    prober: http

    http:

      method: GET

      tls_config:

        ca_file: "/certs/my_cert.crt"

  http_gzip:

    prober: http

    http:

      method: GET

      compression: gzip

  http_gzip_with_accept_encoding:

    prober: http

    http:

      method: GET

      compression: gzip

      headers:

        Accept-Encoding: gzip

  tls_connect:

    prober: tcp

    timeout: 5s

    tcp:

      tls: true

  tcp_connect_example:

    prober: tcp

    timeout: 5s

  imap_starttls:

    prober: tcp

    timeout: 5s

    tcp:

      query_response:

        - expect: "OK.*STARTTLS"

        - send: ". STARTTLS"

        - expect: "OK"

        - starttls: true

        - send: ". capability"

        - expect: "CAPABILITY IMAP4rev1"

  smtp_starttls:

    prober: tcp

    timeout: 5s

    tcp:

      query_response:

        - expect: "^220 ([^ ]+) ESMTP (.+)$"

        - send: "EHLO prober\r"

        - expect: "^250-STARTTLS"

        - send: "STARTTLS\r"

        - expect: "^220"

        - starttls: true

        - send: "EHLO prober\r"

        - expect: "^250-AUTH"

        - send: "QUIT\r"

  irc_banner_example:

    prober: tcp

    timeout: 5s

    tcp:

      query_response:

        - send: "NICK prober"

        - send: "USER prober prober prober :prober"

        - expect: "PING :([^ ]+)"

          send: "PONG ${1}"

        - expect: "^:[^ ]+ 001"

  icmp_example:

    prober: icmp

    timeout: 5s

    icmp:

      preferred_ip_protocol: "ip4"

      source_ip_address: "127.0.0.1"

  dns_udp_example:

    prober: dns

    timeout: 5s

    dns:

      query_name: "www.prometheus.io"

      query_type: "A"

      valid_rcodes:

      - NOERROR

      validate_answer_rrs:

        fail_if_matches_regexp:

        - ".*127.0.0.1"

        fail_if_all_match_regexp:

        - ".*127.0.0.1"

        fail_if_not_matches_regexp:

        - "www.prometheus.io.\t300\tIN\tA\t127.0.0.1"

        fail_if_none_matches_regexp:

        - "127.0.0.1"

      validate_authority_rrs:

        fail_if_matches_regexp:

        - ".*127.0.0.1"

      validate_additional_rrs:

        fail_if_matches_regexp:

        - ".*127.0.0.1"

  dns_soa:

    prober: dns

    dns:

      query_name: "prometheus.io"

      query_type: "SOA"

  dns_tcp_example:

    prober: dns

    dns:

      transport_protocol: "tcp" # defaults to "udp"

      preferred_ip_protocol: "ip4" # defaults to "ip6"

      query_name: "www.prometheus.io"

http_2xx模块

简单看一下http_2xx这个模块

TypeScript

modules:

  http_2xx:

    prober: http

    timeout: 5s

    http:

      valid_http_versions: ["HTTP/1.1", "HTTP/2.0"]

      valid_status_codes: [200]  # Defaults to 2xx

      method: GET

      headers:

        Host: blackbox.ilomumu.xyz

        Accept-Language: en-US

        Origin: ilomumu.xyz

      # 不跟重定向跟随

      no_follow_redirects: false

      # 使用ssl时判断为失败

      fail_if_ssl: false

      # 不使用ssl时判断为失败

      fail_if_not_ssl: false

      # body内容匹配就判断为失败

      fail_if_body_matches_regexp:

        - "Could not connect to database"

      fail_if_body_not_matches_regexp:

        - "Download the latest version here"

      # header匹配则判断为失败

      fail_if_header_matches: # Verifies that no cookies are set

        - header: Set-Cookie

          allow_missing: true

          regexp: '.*'

      # header不匹配则判断为失败

      fail_if_header_not_matches:

        - header: Access-Control-Allow-Origin

          regexp: '(\ *|example\.com)'

      # tls 证书配置

      tls_config:

        insecure_skip_verify: false

      preferred_ip_protocol: "ip4" # defaults to "ip6"

      ip_protocol_fallback: false  # no fallback to "ip6"

我们使用的配置文件

TypeScript

modules:

  http_2xx:

    prober: http

    timeout: 5s

    http:

      valid_http_versions:

        - HTTP/1.1

        - HTTP/2.0

      valid_status_codes:

        - 200

      method: GET

      headers:

        Host: blackbox.ilomumu.xyz

        Accept-Language: en-US

        Origin: ilomumu.xyz

      # 这里没有使用no_follow_redirects是因为no_follow_redirects在未来的版本中会被follow_redirects替代

      follow_redirects: true

      fail_if_ssl: false

      fail_if_not_ssl: false

      tls_config:

        insecure_skip_verify: true

      preferred_ip_protocol: ip4

      ip_protocol_fallback: false

修改完需要重新部署deployment使新的配置文件生效

配置prometheus

配置blackbox_exporter到prometheus

这里如果要配置blackbox_exporter到二进制部署的prometheus很简单，我们只需要修改promethues的prometheus.yml文件，在scrape_configs配置相应字段即可。

TypeScript

scrape_configs:

  - job_name: 'blackbox'

    metrics_path: /probe

    params:

      module: [http_2xx]  # Look for a HTTP 200 response.

    static_configs:

      - targets:

        - http://prometheus.io    # Target to probe with http.

        - https://prometheus.io   # Target to probe with https.

        - http://example.com:8080 # Target to probe with http on port 8080.

    relabel_configs:

      - source_labels: [__address__]

        target_label:__ param_target

      - source_labels: [__param_target]

        target_label: instance

      - target_label: __address__

        replacement: 127.0.0.1:9115  # The blackbox exporter's real hostname:port.

根据我们的需求修改上述文件即可

配置blackbox_exporter到prometheus operator

编写ServiceMonitor文件

但是我们使用的是prometheus operator，此时我们如果直接进入容器修改prometheus.yml显然是不现实的。但是如果我们去查看configMap会发现也没有prometheus.yml相关的配置文件，那我们如何修改prometheus.yml？

这里其实我们要使用到ServiceMonitor这种资源，通过ServiceMonitor来使用prometheus的动态发现。

TypeScript

kubectl get ServiceMonitor -A

所以我们要编写一个适合我们使用的ServiceMonitor来讲blackbox_exporter配置到promethues中

接下来的文件中relabelings段可能比较难以理解，下文会解释为何如此进行编写

TypeScript

apiVersion: monitoring.coreos.com/v1

kind: ServiceMonitor

metadata:

  name: prometheus-blackbox-exporter

  namespace: prometheus

spec:

  endpoints:

  - params:

      # 监控使用的module，这里就是我们blackbox.yml中配置的监控规则

      module:

      - http_2xx

    path: /probe # 此处路径固定，为blackbox-exporter暴露的监控页面

    port: web # 监控使用的端口（可以是service端口的名称）

    relabelings:

    # 正则替换这里主要是为了实现自定义监控路径和地址

    # 整体这段替换的目的就是为了将默认的目标地址替换为我们真正需要监控的地址和路径

    - action: replace

      # 替换内容

      regex: (. *)

      # 替换规则

      replacement: ${1}${2}

      # 分隔符

      separator: /

      # 替换的内容

      # 这里使用了两个内置变量

      sourceLabels:

      # 地址（需要监控的svc的地址）

      - __address__

      # 标签 prometheus.io.http.probe.path 的值

      - __meta_kubernetes_service_label_prometheus_io_http_probe_path

      # 被替换内容

      targetLabel:__ param_target

    # 这段替换的目的是为了将默认访问地址换成我们blackbox_exporter的地址

    - action: replace

      regex: (. *)

      # blackbox-exporter的svc地址

      # 为了实现跨namespace访问，地址需要写成{SERVICE_NAME}.{NAMESPACE_NAME}.svc.cluster.local 

      replacement: prometheus-blackbox-exporter.prometheus.svc.cluster.local:9115

      sourceLabels:

      - __address__

      targetLabel: __address__

    # 添加label方便后续进行辨认

    - action: replace

      regex: (.*)

      replacement: $1

      sourceLabels:

      -__ param_target

      targetLabel: instance

    # 添加label方便后续进行辨认

    - action: replace

      regex: (. *)

      replacement: $1

      sourceLabels:

      - __param_module

      targetLabel: module

  # 命名空间限制

  namespaceSelector:

    # 允许所有命名空间

    any: true

  # label匹配，通过该字段可以控制那些svc需要被监控

  selector:

    matchLabels:

      blackbox-monitor: "true"

[重要]relabelings为何如此配置

首先我们要了解黑盒监控的根本原理。

首先根据前文的介绍我们可以知道，黑盒监控简单来说就是以用户的身份测试服务的外部可见性。

那这里的用户是谁呢？

其实我们的blackbox_exporter就扮演了用户的身份。也就是说blackbox_exporter启动的pod去访问了我们的监控目标，blackbox的pod通过自己访问各个业务的结果生成了相关的监控结果。

所以说如果promethues想要获取监控指标和结果，是需要访问我们blackbox_exporter这个pod提供的页面来拿到相关的数据的。

例如:

📌我们需要使用blackbox_export中的module http_2xx来监控http://www.baidu.com。那么我们应该这样去访问这样的地址

TypeScript

http://blackbox_exporter:9115/probe?module=http_2xx&target=www.baidu.com

📌这里我们可以看出来我们需要监控的地址和module都是以参数的形式添加到了url中，而我们访问的目的还是blackbox_exporter的9115端口路径为probe

但是如果我们不配置relabelings对一些字段进行替换，根据默认的配置（prometheus.yml中scrape_configs）promethues回去访问需要监控的各个svc的/probe这个路径来获取数据。也就是说我们在prometheus targets页面上看到的endpoint是下面这个地址

# 此处为模拟的url地址，promethues targets页面上并不会显示url的参数部分，而是以另一种形式来展示

http://<需要监控svc地址>:9115/probe?module=<监控使用的模块>&target=<需要监控svc的地址&gt;



# promethues targets页面上显示的方式

endpoint:http://<blackbox_exporter>:9115/probe

module=<监控使用的模块>

target=<需要监控svc的地址>



# 以监控www.baidu.com为例

http://www.baidu.com/probe?module=http_2xx&target=www.baidu.com

而我们真正需要访问的endpoint应该是这样的

# 此处为模拟的url地址，promethues targets页面上并不会显示url的参数部分，而是以另一种形式来展示

http://<blackbox_exporter>:9115/probe?module=<监控使用的模块>&target=<需要监控svc的地址&gt;



# promethues targets页面上显示的方式

endpoint:http://<blackbox_exporter>:9115/probe

module=<监控使用的模块>

target=<需要监控svc的地址>



# 以监控www.baidu.com为例

http://blackbox_exporter:9115/probe?module=http_2xx&target=www.baidu.com

上述整个relabelings段存在的意义就是为了把endpint换成我们需要的endpoint

配置需要监控的svc

根据我们上面的ServiceMonitor文件可以得知，我们需要配置两个标签来实现让promethues发现我们需要监控的svc

# 以下均为标签



# 这个标签用来决定是否启用blackbox监控

blackbox-monitor: "true"



# 这个标签用来决定我们需要监控的健康检查路径（http://address:port/healthz）

# 路径前不需要添加'/'

# 如果没有健康检查路径可以不填写，默认会以根目录作为健康检查路径

prometheus.io/http-probe-path: healthz

这里需要注意又有lable中的值不允许出现/ ，所以如果我们的健康检查路径如果有多级，就无法使用lable来实现。

那么如果我们的健康检查路径中有多级可以使用注释（annotations）来实现。

使用annotations实现多级健康检查路径

修改ServiceMonitor文件

# 部分ServiceMonitior文件

    - action: replace

      # 替换内容

      regex: (. *)

      # 替换规则

      replacement: ${1}${2}

      # 分隔符

      separator: /

      # 替换的内容

      # 这里使用了两个内置变量

      sourceLabels:

      # 地址（需要监控的svc的地址）

      - __address__

      # 标签 prometheus.io.http.probe.path 的值

      # - __meta_kubernetes_service_label_prometheus_io_http_probe_path

      # 这里我们获取annotations的值而并非lable的值

      -__ meta_kubernetes_service_annotations_prometheus_io_http_probe_path

      # 被替换内容

      targetLabel:__ param_target

同时我们svc需要配置的内容也要变化

# 标签

# 这个标签用来决定是否启用blackbox监控

blackbox-monitor: "true"



# 注释

# 这个注释用来决定我们需要监控的健康检查路径（http://address:port/check/healthz）

# 路径前不需要添加'/'

# 如果没有健康检查路径可以不填写，默认会以根目录作为健康检查路径

prometheus.io/http-probe-path: check/healthz

《行业指标体系白皮书》下载地址：https://www.dtstack.com/resources/1057/?src=bbs

《数据治理行业实践白皮书》下载地址：https://www.dtstack.com/resources/1001/?src=bbs

《数栈V6.0产品白皮书》下载地址：https://www.dtstack.com/resources/1004/?src=bbs

想了解或咨询更多有关袋鼠云大数据产品、行业解决方案、客户案例的朋友，浏览袋鼠云官网：https://www.dtstack.com/?src=bbs

同时，欢迎对大数据开源项目有兴趣的同学加入「袋鼠云开源框架钉钉技术群」，交流最新开源技术信息，群号码：30537511，项目地址：https://github.com/DTStack

大数据大数据运维用户命名空间动态资源分配（DRA）节点交换内存 SWAP Kubernetes初探 kubernetes调度策略 kubernetes调度 Kubernetes prometheus黑盒监控

0条评论

上一篇：Kubernetes v1.30 初探

下一篇：k8s Krew 插件使用指南-上

我要提问

分享经验

社区公告

大数据领域最专业的产品&技术交流社区，专注于探讨与分享大数据领域有趣又火热的信息，专业又专注的数据人园地

最新活动更多

prometheus黑盒监控-上

一.背景

二.操作前了解相关配置和要求

三.操作步骤

helm安装blackbox_exporter

获取helm仓库信息

安装Chart包

卸载Chart包

配置blackbox_ exporter

配置blackbox_exporter配置文件

官方配置文件样例

http_2xx模块

配置prometheus

配置blackbox_exporter到prometheus

配置blackbox_exporter到prometheus operator

编写ServiceMonitor文件

配置需要监控的svc

使用annotations实现多级健康检查路径

我要提问

分享经验

微信扫码获取数字化转型资料

钉钉扫码加入技术交流群