006-kafka基本监控

阅读量: zyh 2019-03-04 18:01:58
Categories: > Tags:

采集工具

https://github.com/danielqsj/kafka_exporter

安装

appVersion=1.4.2
mkdir -p /export/src
cd /export/src
curl "https://github.com/danielqsj/kafka_exporter/releases/download/v${appVersion}/kafka_exporter-${appVersion}.linux-amd64.tar.gz" -o kafka_exporter.tgz && tar xf kafka_exporter.tgz && mv kafka_exporter-${appVersion}.linux-amd64 ../kafka_exporter
cd ../kafka_exporter
./kafka_exporter \
--kafka.server=kafka001:8123 \
--sasl.enabled \
--sasl.username="broker" \
--sasl.password="broker" \
--sasl.mechanism="scram-sha256"

ℹ️默认kafka_exporter监听在9308端口上.

⚠️在1.4.2版本中,kafka_exporter在部分系统中存在无法正确的解析hostname的问题,需直接指明IP。

添加到开机启动

/etc/crontab

@reboot root sleep 120;/export/kafka_exporter/kafka_exporter --kafka.server=kafka001:8123 --sasl.enabled --sasl.username="broker" --sasl.password="broker" --sasl.mechanism="scram-sha256" >> /export/kafka_exporter/kafka_exporter.nohup 2>&1

监控系统

Prometheus

主配置里加入

  - job_name: "kafka_exporter"
    metrics_path: /metrics
    static_configs:
      - targets: ['kafka001:9038'] # kafka_exporter地址和端口
        labels:
          instance: it-zz-kafka-cluster

告警规则里加入

https://awesome-prometheus-alerts.grep.to/rules#kafka-1

  - alert: KafkaTopicsReplicas
    expr: sum(kafka_topic_partition_in_sync_replica) by (topic) < 3
    for: 0m
    labels:
      severity: critical
    annotations:
      summary: Kafka topics replicas (instance {{ $labels.instance }})
      description: "Kafka topic in-sync partition\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"
  - alert: KafkaConsumersGroup
    expr: sum(kafka_consumergroup_lag) by (consumergroup) > 50
    for: 1m
    labels:
      severity: critical
    annotations:
      summary: Kafka consumers group (instance {{ $labels.instance }})
      description: "Kafka consumers group\n  VALUE = {{ $value }}\n  LABELS = {{ $labels }}"

grafana

Grafana Dashboard ID: 7589, name: Kafka Exporter Overview.