“Alertmanager”的版本间的差异

来自linux中国网wiki
跳到导航 跳到搜索
 
(未显示同一用户的30个中间版本)
第3行: 第3行:
  
 
授权密码
 
授权密码
 +
=dingding机器人=
 +
[https://www.cnblogs.com/hong-fithing/p/14868049.html  Docker系列——Grafana+Prometheus+Node-exporter钉钉推送(四) ]
  
 +
=telegram=
 +
通过 prome
 +
 +
这个有空看一下
 +
https://github.com/metalmatze/alertmanager-bot
 +
==*  创建 tg  机器和 报警组==
 +
 +
===** 创建机器 ===
 +
 +
====*** 202011 创建bot的例子====
 +
<pre>
 +
#2020
 +
evan lai, [29.10.20 16:50]
 +
/start
 +
 +
BotFather, [29.10.20 16:50]
 +
I can help you create and manage Telegram bots. If you're new to the Bot API, please see the manual (https://core.telegram.org/bots).
 +
 +
You can control me by sending these commands:
 +
 +
/newbot - create a new bot
 +
/mybots - edit your bots [beta]
 +
 +
Edit Bots
 +
/setname - change a bot's name
 +
/setdescription - change bot description
 +
/setabouttext - change bot about info
 +
/setuserpic - change bot profile photo
 +
/setcommands - change the list of commands
 +
/deletebot - delete a bot
 +
 +
Bot Settings
 +
/token - generate authorization token
 +
/revoke - revoke bot access token
 +
/setinline - toggle inline mode (https://core.telegram.org/bots/inline)
 +
/setinlinegeo - toggle inline location requests (https://core.telegram.org/bots/inline#location-based-results)
 +
/setinlinefeedback - change inline feedback (https://core.telegram.org/bots/inline#collecting-feedback) settings
 +
/setjoingroups - can your bot be added to groups?
 +
/setprivacy - toggle privacy mode (https://core.telegram.org/bots#privacy-mode) in groups
 +
 +
Games
 +
/mygames - edit your games (https://core.telegram.org/bots/games) [beta]
 +
/newgame - create a new game (https://core.telegram.org/bots/games)
 +
/listgames - get a list of your games
 +
/editgame - edit a game
 +
/deletegame - delete an existing game
 +
 +
BotFather, [29.10.20 16:50]
 +
Alright, a new bot. How are we going to call it? Please choose a name for your bot.
 +
 +
evan lai, [29.10.20 16:50]
 +
/newbot
 +
 +
evan lai, [29.10.20 16:51]
 +
evan_alert_bot
 +
 +
BotFather, [29.10.20 16:51]
 +
Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot.
 +
 +
evan lai, [29.10.20 16:51]
 +
evan_alert_bot
 +
 +
BotFather, [29.10.20 16:51]
 +
Done! Congratulations on your new bot. You will find it at t.me/evan_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.
 +
 +
Use this token to access the HTTP API:
 +
1363904888:AAGeUIoxxRMlxk9zHUa2MTRi1My9HDBP69w
 +
Keep your token secure and store it safely, it can be used by anyone to control your bot.
 +
 +
For a description of the Bot API, see this page: https://core.telegram.org/bots/api
 +
</pre>
 +
 +
====有用的信息 ====
 +
<pre>
 +
 +
 +
evan lai, [10.05.20 21:55]
 +
lxtx_prom_alert_bot
 +
 +
BotFather, [10.05.20 21:55]
 +
Done! Congratulations on your new bot. You will find it at t.me/lxtx_prom_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.
 +
 +
Use this token to access the HTTP API:
 +
1157710367:AAFD9YLsjdQ_t7botbVLa4xxWrOc9LVHNYc
 +
Keep your token secure and store it safely, it can be used by anyone to control your bot.
 +
 +
For a description of the Bot API, see this page: https://core.telegram.org/bots/api
 +
 +
 +
使用API/bottoken/API方法getMe获取自己的id
 +
 +
 +
curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7xxxxxLa4imWrOV9LVHNYc/getMe
 +
 +
 +
#前面有bot字母
 +
sns:~# curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_xxxxotbVLa4imWrOV9LVHNYc/getMe
 +
{"ok":true,"result":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot","can_join_groups":true,"can_read_all_group_messages":false,"supports_inline_queries":false}}
 +
 +
</pre>
 +
 +
=== 创建组===
 +
<pre>
 +
获取群ID
 +
 +
在Telegram新建group,然后添加成员刚创建的机器人 (prom_alert_bot) ,调用API方法getUPdates获取群ID
 +
 +
curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7botbVLa4imWrOV9LVHNYc/getUpdates
 +
{"ok":true,"result":[{"update_id":367831744,
 +
"message":{"message_id":1,"from":{"id":796717144,"is_bot":false,"first_name":"evan","last_name":"lai","username":"linuxsa"},"chat":{"id":-470646458,"title":"alerm","type":"group","all_members_are_administrators":true},"date":1597202656,"new_chat_participant":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_member":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_members":[{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"}]}}]}
 +
</pre>
 +
 +
==telegram webhook ==
 +
=== 1.先把 webhook 跑起来 ===
 +
<pre>
 +
 +
 +
git clone https://github.com/evan886/alertmanager-webhook-telegram-python.git
 +
cd  alertmanager-webhook-telegram-python/docker
 +
docker build -t alertmanager-webhook-telegram:1.0 .
 +
docker run -d --name telegram-bot \
 +
-e "bottoken=1157710367:AxxxxxxQ_t7botbVLa4imWrOV9LVHNYc" \
 +
-e "chatid=4706458" \
 +
-e "username=evan" \
 +
-e "password=evanLxx123" \
 +
-p 9119:9119 alertmanager-webhook-telegram:1.0
 +
</pre>
 +
 +
==== 配置 ====
 +
<pre>
 +
cat alertmanager/config.yml
 +
 +
# 定义路由树信息,这个路由可以接收到所有的告警,还可以继续配置路由,比如project: zhidaoAPP(prometheus 告警规则中自定义的lable)发给谁,project: baoxian的发给谁
 +
route:
 +
  group_by: ['alertname'] # 报警分组依据
 +
  group_wait: 10s        # 最初即第一次等待多久时间发送一组警报的通知
 +
  group_interval: 60s    # 在发送新警报前的等待时间
 +
  repeat_interval: 1h    # 发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
 +
  receiver: 'telegram-webhook'      # 发送警报的接收者的名称,以下receivers name的名称
 +
 +
# 定义警报接收者信息
 +
receivers:
 +
  - name: 'telegram-webhook'
 +
    webhook_configs:
 +
    - url: http://evan:[email protected]:9119/alert
 +
 +
</pre>
 +
 +
=== 结果查看===
 +
正常来说 这时候 你的TG 组就有信息了 ,没的话 就停止一个node export  收不到就有问题喽
 +
 +
== trouble==
 +
起不来 老报错 level=error ts=2019-08-26T05:52:52.19072198Z caller=main.go:337 msg="Loading configuration file failed" file=/usr/local/prometheus/alertmanager/alertmanager.yml err="yaml: unmarshal errors:\n  line 12: field receivers not found in type config.plain"  解决办法 用了聪的办法  - url: 'http://用户:密码@172.24.103.122:9119/alert'
 +
 +
== bot  see also==
 +
https://prometheus.io/docs/alerting/latest/configuration/
 +
 +
https://core.telegram.org/bots
 +
 +
[https://techsoftcenter.com/how-to-create-a-telegram-bot-id-chat-id/ How to Create a Telegram Bot ID/Chat ID]
 +
 +
[https://toolbox.kali-linuxtr.net/prometheus-alertmanager-telegram-bot.tool Prometheus Alertmanager Telegram Bot]
 +
 +
[https://www.cnblogs.com/KillBugMe/p/13140226.html 创建telegram 机器人 并发送消息]
 +
 +
[https://www.teleme.io/articles/create_your_own_telegram_bot?hl=zh-hans 如何创建我自己的电报机器人(Telegram Bot)]
 +
 +
[https://nova.moe/manage-host-alert-on-telegram-with-grafana/ 在 Telegram 中管理主机监控和警报信息]
 +
 +
https://github.com/inCaller/prometheus_bot
 +
 +
https://github.com/metalmatze/alertmanager-bot
 +
 +
[https://blog.csdn.net/weixin_34242331/article/details/91875514  基于prometheus + grafana + mysql + Telegram 监控告警]
 +
 +
https://my.oschina.net/54188zz/blog/3030618
 +
 +
[https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/prometheus-alert-rule 自定义Prometheus告警规则]
 +
 +
[https://www.linux.org.ru/forum/general/14894302  prometheus alertmanager telegram ]
 +
 +
[https://www.cnblogs.com/wangxu01/articles/11654836.html 部署Alertmanager实现邮件/钉钉/微信报警]
 +
 +
[https://www.cnblogs.com/xiaobaozi-95/p/10740511.html prometheus告警插件-alertmanager]
 +
 +
 +
 +
[https://github.com/metalmatze/alertmanager-bot This is the Alertmanager bot for Prometheus that notifies you on alerts.]
 +
 +
 +
 +
https://github.com/metalmatze/alertmanager-bot
 +
 +
 +
[https://www.cnblogs.com/longcnblogs/p/9620733.html  Prometheus 和 Alertmanager实战配置]
 +
 +
== 微信==
 +
 +
[https://blog.csdn.net/knight_zhou/article/details/106937276  Prometheus 微信告警注意事项]
 +
=webhook=
 +
 +
 +
[https://blog.csdn.net/shida_csdn/article/details/81980021  prometheus alertmanager webhook 配置教程]
 +
 +
[https://blog.csdn.net/bluuusea/article/details/104619235  prometheus+alertmanager+webhook实现自定义监控报警系统]
  
 
=* intro =
 
=* intro =
第18行: 第225行:
 
   docker pull prom/alertmanager
 
   docker pull prom/alertmanager
 
   docker run --name alertmanager  -d -p 9093:9093  -v /path/to/config.yml:/etc/alertmanager/conf/config.yml prom/alertmanager
 
   docker run --name alertmanager  -d -p 9093:9093  -v /path/to/config.yml:/etc/alertmanager/conf/config.yml prom/alertmanager
 +
 +
 +
 +
 
=* conf =
 
=* conf =
 
<pre>
 
<pre>
第40行: 第251行:
 
#告警解除  
 
#告警解除  
  
<pre>
+
</pre>
  
 
=* 自定义告警规则=
 
=* 自定义告警规则=
第85行: 第296行:
  
 
==** 磁盘自定义告警==
 
==** 磁盘自定义告警==
 +
<pre>
 +
- alert: LowDiskSpaceNodeFilesystemUsage
 +
    expr: 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
 +
    for: 1m
 +
    labels:
 +
      severity: warning
 +
    annotations:
 +
      summary: "Instance {{ $labels.instance  }} :{{ $labels.mountpoint }} 分区使用率过高"
 +
      description: "{{ $labels.instance  }} : {{ $labels.job  }} :{{ $labels.mountpoint  }} 这个分区使用大于百分之80% (当前值:{{ $value }})"
 +
</pre>
 +
 +
=see also=
 +
新环境可能还要分组一下什么的
 +
 +
[https://www.cnblogs.com/hong-fithing/p/14797242.html  Docker系列——Grafana+Prometheus+Node-exporter服务器告警中心(二) ]
 +
 +
[https://blog.csdn.net/y_xiao_/article/details/50818451?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.add_param_isCf&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.add_param_isCf  Prometheus监控 - Alertmanager报警模块]
 +
 +
[https://my.oschina.net/OutOfMemory/blog/4706596 Prometheus监控告警浅析]
 +
 +
[https://www.cnblogs.com/winstom/p/11940570.html Alertmanager 部署配置]
 +
 +
[https://blog.51cto.com/lookingdream/2504572 Prometheus监控node_exporter的告警规则]
 +
 +
 +
[https://blog.csdn.net/weixin_30752699/article/details/101417735 (坑爹错误)记录prometheus中配置alertmanager.yml一次报错]
 +
 +
[https://juejin.im/post/6844903880778579976 Prometheus学习系列(三十九)之报警模板例子 ]
 +
 +
https://prometheus.io/docs/alerting/alertmanager/
 +
 +
[https://www.jianshu.com/p/239b145e2acc Prometheus Alertmanager报警组件]
 +
 +
[https://blog.csdn.net/qq_25178661/article/details/86690729 good-prometheus + AlertManager 实现对多node节点CPU和内存信息的监控]
 +
 +
[https://blog.csdn.net/kozazyh/article/details/80636512  prometheus-常用的监控告警规则]
 +
 +
[https://blog.51cto.com/jerrymin/2333824  Prometheus配合Alertmanager报警系统]
 +
 +
[https://www.cnblogs.com/longcnblogs/p/9620733.html Prometheus 和 Alertmanager实战配置]
 +
 +
[https://www.kancloud.cn/huyipow/prometheus/527563 alertmanager报警规则详解]
 +
 +
 +
[https://blog.csdn.net/wang725/article/details/94174331  prometheus - 监控磁盘]
 +
 +
[https://blog.csdn.net/mnasd/article/details/86694412  Prometheus自定义监控部署]
 +
 +
[https://www.ctolib.com/docs/sfile/prometheus-book/alert/prometheus-alert-rule.html 自定义Prometheus告警规则]
 +
 +
[https://blog.csdn.net/weixin_33827731/article/details/92947113?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase  监控指标以及prometheus规则-不断完善中]
 +
 +
[https://www.cnblogs.com/xiangsikai/p/11290000.html Prometheus 编写告警规则案例]
 +
 +
[https://www.jianshu.com/p/1f05476ebcee 使用prometheus自定义监控]
 +
 +
[https://blog.csdn.net/chubi7812/article/details/100612951?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase  prometheus通过node_exporter抓取的数据准确计算磁盘使用率]
 +
 +
=k8s =
 +
 +
[https://www.qikqiak.com/post/alertmanager-of-prometheus-in-practice/ Prometheus报警AlertManager实战]
 +
[[category:ops]] [[category:container]] [[category:prom]]

2021年10月21日 (四) 11:21的最新版本

* my email

@126.com

授权密码

dingding机器人

Docker系列——Grafana+Prometheus+Node-exporter钉钉推送(四)

telegram

通过 prome 

这个有空看一下 https://github.com/metalmatze/alertmanager-bot

* 创建 tg 机器和 报警组

** 创建机器

*** 202011 创建bot的例子

#2020
evan lai, [29.10.20 16:50]
/start

BotFather, [29.10.20 16:50]
I can help you create and manage Telegram bots. If you're new to the Bot API, please see the manual (https://core.telegram.org/bots).

You can control me by sending these commands:

/newbot - create a new bot
/mybots - edit your bots [beta]

Edit Bots
/setname - change a bot's name
/setdescription - change bot description
/setabouttext - change bot about info
/setuserpic - change bot profile photo
/setcommands - change the list of commands
/deletebot - delete a bot

Bot Settings
/token - generate authorization token
/revoke - revoke bot access token
/setinline - toggle inline mode (https://core.telegram.org/bots/inline)
/setinlinegeo - toggle inline location requests (https://core.telegram.org/bots/inline#location-based-results)
/setinlinefeedback - change inline feedback (https://core.telegram.org/bots/inline#collecting-feedback) settings
/setjoingroups - can your bot be added to groups?
/setprivacy - toggle privacy mode (https://core.telegram.org/bots#privacy-mode) in groups

Games
/mygames - edit your games (https://core.telegram.org/bots/games) [beta]
/newgame - create a new game (https://core.telegram.org/bots/games)
/listgames - get a list of your games
/editgame - edit a game
/deletegame - delete an existing game

BotFather, [29.10.20 16:50]
Alright, a new bot. How are we going to call it? Please choose a name for your bot.

evan lai, [29.10.20 16:50]
/newbot

evan lai, [29.10.20 16:51]
evan_alert_bot

BotFather, [29.10.20 16:51]
Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot.

evan lai, [29.10.20 16:51]
evan_alert_bot

BotFather, [29.10.20 16:51]
Done! Congratulations on your new bot. You will find it at t.me/evan_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.

Use this token to access the HTTP API:
1363904888:AAGeUIoxxRMlxk9zHUa2MTRi1My9HDBP69w
Keep your token secure and store it safely, it can be used by anyone to control your bot.

For a description of the Bot API, see this page: https://core.telegram.org/bots/api

有用的信息



evan lai, [10.05.20 21:55]
lxtx_prom_alert_bot

BotFather, [10.05.20 21:55]
Done! Congratulations on your new bot. You will find it at t.me/lxtx_prom_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.

Use this token to access the HTTP API:
1157710367:AAFD9YLsjdQ_t7botbVLa4xxWrOc9LVHNYc
Keep your token secure and store it safely, it can be used by anyone to control your bot.

For a description of the Bot API, see this page: https://core.telegram.org/bots/api


使用API/bottoken/API方法getMe获取自己的id


curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7xxxxxLa4imWrOV9LVHNYc/getMe


#前面有bot字母 
sns:~# curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_xxxxotbVLa4imWrOV9LVHNYc/getMe
{"ok":true,"result":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot","can_join_groups":true,"can_read_all_group_messages":false,"supports_inline_queries":false}}

创建组

获取群ID

在Telegram新建group,然后添加成员刚创建的机器人 (prom_alert_bot) ,调用API方法getUPdates获取群ID

 curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7botbVLa4imWrOV9LVHNYc/getUpdates
{"ok":true,"result":[{"update_id":367831744,
"message":{"message_id":1,"from":{"id":796717144,"is_bot":false,"first_name":"evan","last_name":"lai","username":"linuxsa"},"chat":{"id":-470646458,"title":"alerm","type":"group","all_members_are_administrators":true},"date":1597202656,"new_chat_participant":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_member":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_members":[{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"}]}}]}

telegram webhook

1.先把 webhook 跑起来



git clone https://github.com/evan886/alertmanager-webhook-telegram-python.git
cd   alertmanager-webhook-telegram-python/docker 
docker build -t alertmanager-webhook-telegram:1.0 .
docker run -d --name telegram-bot \
	-e "bottoken=1157710367:AxxxxxxQ_t7botbVLa4imWrOV9LVHNYc" \
	-e "chatid=4706458" \
	-e "username=evan" \
	-e "password=evanLxx123" \
	-p 9119:9119 alertmanager-webhook-telegram:1.0

配置

cat alertmanager/config.yml

# 定义路由树信息,这个路由可以接收到所有的告警,还可以继续配置路由,比如project: zhidaoAPP(prometheus 告警规则中自定义的lable)发给谁,project: baoxian的发给谁
route:
  group_by: ['alertname'] # 报警分组依据
  group_wait: 10s         # 最初即第一次等待多久时间发送一组警报的通知
  group_interval: 60s     # 在发送新警报前的等待时间
  repeat_interval: 1h     # 发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
  receiver: 'telegram-webhook'       # 发送警报的接收者的名称,以下receivers name的名称

# 定义警报接收者信息
receivers:
  - name: 'telegram-webhook'
    webhook_configs:
    - url: http://evan:[email protected]:9119/alert

结果查看

正常来说 这时候 你的TG 组就有信息了 ,没的话 就停止一个node export  收不到就有问题喽

trouble

起不来 老报错 level=error ts=2019-08-26T05:52:52.19072198Z caller=main.go:337 msg="Loading configuration file failed" file=/usr/local/prometheus/alertmanager/alertmanager.yml err="yaml: unmarshal errors:\n  line 12: field receivers not found in type config.plain"   解决办法 用了聪的办法  - url: 'http://用户:密码@172.24.103.122:9119/alert'

bot see also

https://prometheus.io/docs/alerting/latest/configuration/

https://core.telegram.org/bots

How to Create a Telegram Bot ID/Chat ID

Prometheus Alertmanager Telegram Bot

创建telegram 机器人 并发送消息

如何创建我自己的电报机器人(Telegram Bot)

在 Telegram 中管理主机监控和警报信息

https://github.com/inCaller/prometheus_bot

https://github.com/metalmatze/alertmanager-bot

基于prometheus + grafana + mysql + Telegram 监控告警

https://my.oschina.net/54188zz/blog/3030618

自定义Prometheus告警规则

prometheus alertmanager telegram

部署Alertmanager实现邮件/钉钉/微信报警

prometheus告警插件-alertmanager


This is the Alertmanager bot for Prometheus that notifies you on alerts.


https://github.com/metalmatze/alertmanager-bot


Prometheus 和 Alertmanager实战配置

微信

Prometheus 微信告警注意事项

webhook

prometheus alertmanager webhook 配置教程

prometheus+alertmanager+webhook实现自定义监控报警系统

* intro

告警能力在Prometheus的架构中被划分为两个部分,在Prometheus Server中定义告警规则以及产生告警,Alertmanager组件则用于处理这些由Prometheus产生的告警。Alertmanager即Prometheus体系中告警的统一处理中心。 Alertmanager提供了多种内置第三方告警通知方式,同时还提供了对Webhook通知的支持,通过Webhook用户可以完成对告警更多个性化的扩展。

* ins

** using docker or docker-composer

用自带的 compose

https://hub.docker.com/r/prom/alertmanager/dockerfile

      • docker only
 docker pull prom/alertmanager
 docker run --name alertmanager  -d -p 9093:9093   -v /path/to/config.yml:/etc/alertmanager/conf/config.yml prom/alertmanager



* conf

rules


 vim node-up.rules
groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="node-exporter"} == 0
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过 15s!"

说明一下:该 rules 目的是监测 node 是否存活,expr 为 PromQL 表达式验证特定节点 job="node-exporter" 是否活着,for 表示报警状态为 Pending 后等待 15s 变成 Firing 状态,一旦变成 Firing 状态则将报警发送到 AlertManager,labels 和 annotations 对该 alert 添加更多的标识说明信息,所有添加的标签注解信息,以及 prometheus.yml 中该 job 已添加 label 都会自动添加到邮件内容中,更多关于 rule 详细配置可以参考

#告警解除 

* 自定义告警规则

** CPU load 自定义告警规则

  - alert: high_load-85per
    expr: (100-(avg(irate(node_cpu_seconds_total{mode="idle"}[5m]))by (job)) * 100)  > 80
    #expr: sum(avg without (cpu)(irate(node_cpu{mode!='idle'}[5m]))) by (instance) > 0.81
    #expr: node_load1 > 0.2
    for: 10m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} under high load"
      description: "{{ $labels.instance }} of job {{ $labels.job }} is under high load more than  12 minutes."

 FIRING 才会 send email 

** 内存自定义告警规则

#rules file 注意空格在前面哦 
- alert: hostMemUsageAlert
    expr: ((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90
    #expr: (node_memory_MemTotal - node_memory_MemAvailable)/node_memory_MemTotal > 0.85
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} MEM usgae high"
      description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"


成功的自定义报警规则 2020 https://www.shared-code.com/article/84

这个成功的 上面的不成功

((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90

常用prometheus告警规则模板(三


自定义Prometheus告警规则

** 磁盘自定义告警

- alert: LowDiskSpaceNodeFilesystemUsage
    expr: 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "Instance {{ $labels.instance  }} :{{ $labels.mountpoint }} 分区使用率过高" 
      description: "{{ $labels.instance  }} : {{ $labels.job  }} :{{ $labels.mountpoint  }} 这个分区使用大于百分之80% (当前值:{{ $value }})"

see also

新环境可能还要分组一下什么的

Docker系列——Grafana+Prometheus+Node-exporter服务器告警中心(二)

Prometheus监控 - Alertmanager报警模块

Prometheus监控告警浅析

Alertmanager 部署配置

Prometheus监控node_exporter的告警规则


(坑爹错误)记录prometheus中配置alertmanager.yml一次报错

Prometheus学习系列(三十九)之报警模板例子

https://prometheus.io/docs/alerting/alertmanager/

Prometheus Alertmanager报警组件

good-prometheus + AlertManager 实现对多node节点CPU和内存信息的监控

prometheus-常用的监控告警规则

Prometheus配合Alertmanager报警系统

Prometheus 和 Alertmanager实战配置

alertmanager报警规则详解


prometheus - 监控磁盘

Prometheus自定义监控部署

自定义Prometheus告警规则

监控指标以及prometheus规则-不断完善中

Prometheus 编写告警规则案例

使用prometheus自定义监控

prometheus通过node_exporter抓取的数据准确计算磁盘使用率

k8s

Prometheus报警AlertManager实战