采集工具
https://github.com/danielqsj/kafka_exporter
安装
appVersion=1.4.2
mkdir -p /export/src
cd /export/src
curl "https://github.com/danielqsj/kafka_exporter/releases/download/v${appVersion}/kafka_exporter-${appVersion}.linux-amd64.tar.gz" -o kafka_exporter.tgz && tar xf kafka_exporter.tgz && mv kafka_exporter-${appVersion}.linux-amd64 ../kafka_exporter
cd ../kafka_exporter
./kafka_exporter \
--kafka.server=kafka001:8123 \
--sasl.enabled \
--sasl.username="broker" \
--sasl.password="broker" \
--sasl.mechanism="scram-sha256"
ℹ️默认kafka_exporter
监听在9308
端口上.
⚠️在1.4.2版本中,kafka_exporter在部分系统中存在无法正确的解析hostname的问题,需直接指明IP。
添加到开机启动
/etc/crontab
@reboot root sleep 120;/export/kafka_exporter/kafka_exporter --kafka.server=kafka001:8123 --sasl.enabled --sasl.username="broker" --sasl.password="broker" --sasl.mechanism="scram-sha256" >> /export/kafka_exporter/kafka_exporter.nohup 2>&1
监控系统
Prometheus
主配置里加入
- job_name: "kafka_exporter"
metrics_path: /metrics
static_configs:
- targets: ['kafka001:9038'] # kafka_exporter地址和端口
labels:
instance: it-zz-kafka-cluster
告警规则里加入
https://awesome-prometheus-alerts.grep.to/rules#kafka-1
- alert: KafkaTopicsReplicas
expr: sum(kafka_topic_partition_in_sync_replica) by (topic) < 3
for: 0m
labels:
severity: critical
annotations:
summary: Kafka topics replicas (instance {{ $labels.instance }})
description: "Kafka topic in-sync partition\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
- alert: KafkaConsumersGroup
expr: sum(kafka_consumergroup_lag) by (consumergroup) > 50
for: 1m
labels:
severity: critical
annotations:
summary: Kafka consumers group (instance {{ $labels.instance }})
description: "Kafka consumers group\n VALUE = {{ $value }}\n LABELS = {{ $labels }}"
grafana
Grafana Dashboard ID: 7589, name: Kafka Exporter Overview.