“Alertmanager”的版本间的差异
(未显示同一用户的30个中间版本) | |||
第3行: | 第3行: | ||
授权密码 | 授权密码 | ||
+ | =dingding机器人= | ||
+ | [https://www.cnblogs.com/hong-fithing/p/14868049.html Docker系列——Grafana+Prometheus+Node-exporter钉钉推送(四) ] | ||
+ | =telegram= | ||
+ | 通过 prome | ||
+ | |||
+ | 这个有空看一下 | ||
+ | https://github.com/metalmatze/alertmanager-bot | ||
+ | ==* 创建 tg 机器和 报警组== | ||
+ | |||
+ | ===** 创建机器 === | ||
+ | |||
+ | ====*** 202011 创建bot的例子==== | ||
+ | <pre> | ||
+ | #2020 | ||
+ | evan lai, [29.10.20 16:50] | ||
+ | /start | ||
+ | |||
+ | BotFather, [29.10.20 16:50] | ||
+ | I can help you create and manage Telegram bots. If you're new to the Bot API, please see the manual (https://core.telegram.org/bots). | ||
+ | |||
+ | You can control me by sending these commands: | ||
+ | |||
+ | /newbot - create a new bot | ||
+ | /mybots - edit your bots [beta] | ||
+ | |||
+ | Edit Bots | ||
+ | /setname - change a bot's name | ||
+ | /setdescription - change bot description | ||
+ | /setabouttext - change bot about info | ||
+ | /setuserpic - change bot profile photo | ||
+ | /setcommands - change the list of commands | ||
+ | /deletebot - delete a bot | ||
+ | |||
+ | Bot Settings | ||
+ | /token - generate authorization token | ||
+ | /revoke - revoke bot access token | ||
+ | /setinline - toggle inline mode (https://core.telegram.org/bots/inline) | ||
+ | /setinlinegeo - toggle inline location requests (https://core.telegram.org/bots/inline#location-based-results) | ||
+ | /setinlinefeedback - change inline feedback (https://core.telegram.org/bots/inline#collecting-feedback) settings | ||
+ | /setjoingroups - can your bot be added to groups? | ||
+ | /setprivacy - toggle privacy mode (https://core.telegram.org/bots#privacy-mode) in groups | ||
+ | |||
+ | Games | ||
+ | /mygames - edit your games (https://core.telegram.org/bots/games) [beta] | ||
+ | /newgame - create a new game (https://core.telegram.org/bots/games) | ||
+ | /listgames - get a list of your games | ||
+ | /editgame - edit a game | ||
+ | /deletegame - delete an existing game | ||
+ | |||
+ | BotFather, [29.10.20 16:50] | ||
+ | Alright, a new bot. How are we going to call it? Please choose a name for your bot. | ||
+ | |||
+ | evan lai, [29.10.20 16:50] | ||
+ | /newbot | ||
+ | |||
+ | evan lai, [29.10.20 16:51] | ||
+ | evan_alert_bot | ||
+ | |||
+ | BotFather, [29.10.20 16:51] | ||
+ | Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot. | ||
+ | |||
+ | evan lai, [29.10.20 16:51] | ||
+ | evan_alert_bot | ||
+ | |||
+ | BotFather, [29.10.20 16:51] | ||
+ | Done! Congratulations on your new bot. You will find it at t.me/evan_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this. | ||
+ | |||
+ | Use this token to access the HTTP API: | ||
+ | 1363904888:AAGeUIoxxRMlxk9zHUa2MTRi1My9HDBP69w | ||
+ | Keep your token secure and store it safely, it can be used by anyone to control your bot. | ||
+ | |||
+ | For a description of the Bot API, see this page: https://core.telegram.org/bots/api | ||
+ | </pre> | ||
+ | |||
+ | ====有用的信息 ==== | ||
+ | <pre> | ||
+ | |||
+ | |||
+ | evan lai, [10.05.20 21:55] | ||
+ | lxtx_prom_alert_bot | ||
+ | |||
+ | BotFather, [10.05.20 21:55] | ||
+ | Done! Congratulations on your new bot. You will find it at t.me/lxtx_prom_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this. | ||
+ | |||
+ | Use this token to access the HTTP API: | ||
+ | 1157710367:AAFD9YLsjdQ_t7botbVLa4xxWrOc9LVHNYc | ||
+ | Keep your token secure and store it safely, it can be used by anyone to control your bot. | ||
+ | |||
+ | For a description of the Bot API, see this page: https://core.telegram.org/bots/api | ||
+ | |||
+ | |||
+ | 使用API/bottoken/API方法getMe获取自己的id | ||
+ | |||
+ | |||
+ | curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7xxxxxLa4imWrOV9LVHNYc/getMe | ||
+ | |||
+ | |||
+ | #前面有bot字母 | ||
+ | sns:~# curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_xxxxotbVLa4imWrOV9LVHNYc/getMe | ||
+ | {"ok":true,"result":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot","can_join_groups":true,"can_read_all_group_messages":false,"supports_inline_queries":false}} | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | === 创建组=== | ||
+ | <pre> | ||
+ | 获取群ID | ||
+ | |||
+ | 在Telegram新建group,然后添加成员刚创建的机器人 (prom_alert_bot) ,调用API方法getUPdates获取群ID | ||
+ | |||
+ | curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7botbVLa4imWrOV9LVHNYc/getUpdates | ||
+ | {"ok":true,"result":[{"update_id":367831744, | ||
+ | "message":{"message_id":1,"from":{"id":796717144,"is_bot":false,"first_name":"evan","last_name":"lai","username":"linuxsa"},"chat":{"id":-470646458,"title":"alerm","type":"group","all_members_are_administrators":true},"date":1597202656,"new_chat_participant":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_member":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_members":[{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"}]}}]} | ||
+ | </pre> | ||
+ | |||
+ | ==telegram webhook == | ||
+ | === 1.先把 webhook 跑起来 === | ||
+ | <pre> | ||
+ | |||
+ | |||
+ | git clone https://github.com/evan886/alertmanager-webhook-telegram-python.git | ||
+ | cd alertmanager-webhook-telegram-python/docker | ||
+ | docker build -t alertmanager-webhook-telegram:1.0 . | ||
+ | docker run -d --name telegram-bot \ | ||
+ | -e "bottoken=1157710367:AxxxxxxQ_t7botbVLa4imWrOV9LVHNYc" \ | ||
+ | -e "chatid=4706458" \ | ||
+ | -e "username=evan" \ | ||
+ | -e "password=evanLxx123" \ | ||
+ | -p 9119:9119 alertmanager-webhook-telegram:1.0 | ||
+ | </pre> | ||
+ | |||
+ | ==== 配置 ==== | ||
+ | <pre> | ||
+ | cat alertmanager/config.yml | ||
+ | |||
+ | # 定义路由树信息,这个路由可以接收到所有的告警,还可以继续配置路由,比如project: zhidaoAPP(prometheus 告警规则中自定义的lable)发给谁,project: baoxian的发给谁 | ||
+ | route: | ||
+ | group_by: ['alertname'] # 报警分组依据 | ||
+ | group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知 | ||
+ | group_interval: 60s # 在发送新警报前的等待时间 | ||
+ | repeat_interval: 1h # 发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝 | ||
+ | receiver: 'telegram-webhook' # 发送警报的接收者的名称,以下receivers name的名称 | ||
+ | |||
+ | # 定义警报接收者信息 | ||
+ | receivers: | ||
+ | - name: 'telegram-webhook' | ||
+ | webhook_configs: | ||
+ | - url: http://evan:[email protected]:9119/alert | ||
+ | |||
+ | </pre> | ||
+ | |||
+ | === 结果查看=== | ||
+ | 正常来说 这时候 你的TG 组就有信息了 ,没的话 就停止一个node export 收不到就有问题喽 | ||
+ | |||
+ | == trouble== | ||
+ | 起不来 老报错 level=error ts=2019-08-26T05:52:52.19072198Z caller=main.go:337 msg="Loading configuration file failed" file=/usr/local/prometheus/alertmanager/alertmanager.yml err="yaml: unmarshal errors:\n line 12: field receivers not found in type config.plain" 解决办法 用了聪的办法 - url: 'http://用户:密码@172.24.103.122:9119/alert' | ||
+ | |||
+ | == bot see also== | ||
+ | https://prometheus.io/docs/alerting/latest/configuration/ | ||
+ | |||
+ | https://core.telegram.org/bots | ||
+ | |||
+ | [https://techsoftcenter.com/how-to-create-a-telegram-bot-id-chat-id/ How to Create a Telegram Bot ID/Chat ID] | ||
+ | |||
+ | [https://toolbox.kali-linuxtr.net/prometheus-alertmanager-telegram-bot.tool Prometheus Alertmanager Telegram Bot] | ||
+ | |||
+ | [https://www.cnblogs.com/KillBugMe/p/13140226.html 创建telegram 机器人 并发送消息] | ||
+ | |||
+ | [https://www.teleme.io/articles/create_your_own_telegram_bot?hl=zh-hans 如何创建我自己的电报机器人(Telegram Bot)] | ||
+ | |||
+ | [https://nova.moe/manage-host-alert-on-telegram-with-grafana/ 在 Telegram 中管理主机监控和警报信息] | ||
+ | |||
+ | https://github.com/inCaller/prometheus_bot | ||
+ | |||
+ | https://github.com/metalmatze/alertmanager-bot | ||
+ | |||
+ | [https://blog.csdn.net/weixin_34242331/article/details/91875514 基于prometheus + grafana + mysql + Telegram 监控告警] | ||
+ | |||
+ | https://my.oschina.net/54188zz/blog/3030618 | ||
+ | |||
+ | [https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/prometheus-alert-rule 自定义Prometheus告警规则] | ||
+ | |||
+ | [https://www.linux.org.ru/forum/general/14894302 prometheus alertmanager telegram ] | ||
+ | |||
+ | [https://www.cnblogs.com/wangxu01/articles/11654836.html 部署Alertmanager实现邮件/钉钉/微信报警] | ||
+ | |||
+ | [https://www.cnblogs.com/xiaobaozi-95/p/10740511.html prometheus告警插件-alertmanager] | ||
+ | |||
+ | |||
+ | |||
+ | [https://github.com/metalmatze/alertmanager-bot This is the Alertmanager bot for Prometheus that notifies you on alerts.] | ||
+ | |||
+ | |||
+ | |||
+ | https://github.com/metalmatze/alertmanager-bot | ||
+ | |||
+ | |||
+ | [https://www.cnblogs.com/longcnblogs/p/9620733.html Prometheus 和 Alertmanager实战配置] | ||
+ | |||
+ | == 微信== | ||
+ | |||
+ | [https://blog.csdn.net/knight_zhou/article/details/106937276 Prometheus 微信告警注意事项] | ||
+ | =webhook= | ||
+ | |||
+ | |||
+ | [https://blog.csdn.net/shida_csdn/article/details/81980021 prometheus alertmanager webhook 配置教程] | ||
+ | |||
+ | [https://blog.csdn.net/bluuusea/article/details/104619235 prometheus+alertmanager+webhook实现自定义监控报警系统] | ||
=* intro = | =* intro = | ||
第18行: | 第225行: | ||
docker pull prom/alertmanager | docker pull prom/alertmanager | ||
docker run --name alertmanager -d -p 9093:9093 -v /path/to/config.yml:/etc/alertmanager/conf/config.yml prom/alertmanager | docker run --name alertmanager -d -p 9093:9093 -v /path/to/config.yml:/etc/alertmanager/conf/config.yml prom/alertmanager | ||
+ | |||
+ | |||
+ | |||
+ | |||
=* conf = | =* conf = | ||
<pre> | <pre> | ||
第40行: | 第251行: | ||
#告警解除 | #告警解除 | ||
− | <pre> | + | </pre> |
=* 自定义告警规则= | =* 自定义告警规则= | ||
第85行: | 第296行: | ||
==** 磁盘自定义告警== | ==** 磁盘自定义告警== | ||
+ | <pre> | ||
+ | - alert: LowDiskSpaceNodeFilesystemUsage | ||
+ | expr: 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80 | ||
+ | for: 1m | ||
+ | labels: | ||
+ | severity: warning | ||
+ | annotations: | ||
+ | summary: "Instance {{ $labels.instance }} :{{ $labels.mountpoint }} 分区使用率过高" | ||
+ | description: "{{ $labels.instance }} : {{ $labels.job }} :{{ $labels.mountpoint }} 这个分区使用大于百分之80% (当前值:{{ $value }})" | ||
+ | </pre> | ||
+ | |||
+ | =see also= | ||
+ | 新环境可能还要分组一下什么的 | ||
+ | |||
+ | [https://www.cnblogs.com/hong-fithing/p/14797242.html Docker系列——Grafana+Prometheus+Node-exporter服务器告警中心(二) ] | ||
+ | |||
+ | [https://blog.csdn.net/y_xiao_/article/details/50818451?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.add_param_isCf&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.add_param_isCf Prometheus监控 - Alertmanager报警模块] | ||
+ | |||
+ | [https://my.oschina.net/OutOfMemory/blog/4706596 Prometheus监控告警浅析] | ||
+ | |||
+ | [https://www.cnblogs.com/winstom/p/11940570.html Alertmanager 部署配置] | ||
+ | |||
+ | [https://blog.51cto.com/lookingdream/2504572 Prometheus监控node_exporter的告警规则] | ||
+ | |||
+ | |||
+ | [https://blog.csdn.net/weixin_30752699/article/details/101417735 (坑爹错误)记录prometheus中配置alertmanager.yml一次报错] | ||
+ | |||
+ | [https://juejin.im/post/6844903880778579976 Prometheus学习系列(三十九)之报警模板例子 ] | ||
+ | |||
+ | https://prometheus.io/docs/alerting/alertmanager/ | ||
+ | |||
+ | [https://www.jianshu.com/p/239b145e2acc Prometheus Alertmanager报警组件] | ||
+ | |||
+ | [https://blog.csdn.net/qq_25178661/article/details/86690729 good-prometheus + AlertManager 实现对多node节点CPU和内存信息的监控] | ||
+ | |||
+ | [https://blog.csdn.net/kozazyh/article/details/80636512 prometheus-常用的监控告警规则] | ||
+ | |||
+ | [https://blog.51cto.com/jerrymin/2333824 Prometheus配合Alertmanager报警系统] | ||
+ | |||
+ | [https://www.cnblogs.com/longcnblogs/p/9620733.html Prometheus 和 Alertmanager实战配置] | ||
+ | |||
+ | [https://www.kancloud.cn/huyipow/prometheus/527563 alertmanager报警规则详解] | ||
+ | |||
+ | |||
+ | [https://blog.csdn.net/wang725/article/details/94174331 prometheus - 监控磁盘] | ||
+ | |||
+ | [https://blog.csdn.net/mnasd/article/details/86694412 Prometheus自定义监控部署] | ||
+ | |||
+ | [https://www.ctolib.com/docs/sfile/prometheus-book/alert/prometheus-alert-rule.html 自定义Prometheus告警规则] | ||
+ | |||
+ | [https://blog.csdn.net/weixin_33827731/article/details/92947113?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase 监控指标以及prometheus规则-不断完善中] | ||
+ | |||
+ | [https://www.cnblogs.com/xiangsikai/p/11290000.html Prometheus 编写告警规则案例] | ||
+ | |||
+ | [https://www.jianshu.com/p/1f05476ebcee 使用prometheus自定义监控] | ||
+ | |||
+ | [https://blog.csdn.net/chubi7812/article/details/100612951?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase prometheus通过node_exporter抓取的数据准确计算磁盘使用率] | ||
+ | |||
+ | =k8s = | ||
+ | |||
+ | [https://www.qikqiak.com/post/alertmanager-of-prometheus-in-practice/ Prometheus报警AlertManager实战] | ||
+ | [[category:ops]] [[category:container]] [[category:prom]] |
2021年10月21日 (四) 11:21的最新版本
目录
* my email
@126.com
授权密码
dingding机器人
Docker系列——Grafana+Prometheus+Node-exporter钉钉推送(四)
telegram
通过 prome
这个有空看一下 https://github.com/metalmatze/alertmanager-bot
* 创建 tg 机器和 报警组
** 创建机器
*** 202011 创建bot的例子
#2020 evan lai, [29.10.20 16:50] /start BotFather, [29.10.20 16:50] I can help you create and manage Telegram bots. If you're new to the Bot API, please see the manual (https://core.telegram.org/bots). You can control me by sending these commands: /newbot - create a new bot /mybots - edit your bots [beta] Edit Bots /setname - change a bot's name /setdescription - change bot description /setabouttext - change bot about info /setuserpic - change bot profile photo /setcommands - change the list of commands /deletebot - delete a bot Bot Settings /token - generate authorization token /revoke - revoke bot access token /setinline - toggle inline mode (https://core.telegram.org/bots/inline) /setinlinegeo - toggle inline location requests (https://core.telegram.org/bots/inline#location-based-results) /setinlinefeedback - change inline feedback (https://core.telegram.org/bots/inline#collecting-feedback) settings /setjoingroups - can your bot be added to groups? /setprivacy - toggle privacy mode (https://core.telegram.org/bots#privacy-mode) in groups Games /mygames - edit your games (https://core.telegram.org/bots/games) [beta] /newgame - create a new game (https://core.telegram.org/bots/games) /listgames - get a list of your games /editgame - edit a game /deletegame - delete an existing game BotFather, [29.10.20 16:50] Alright, a new bot. How are we going to call it? Please choose a name for your bot. evan lai, [29.10.20 16:50] /newbot evan lai, [29.10.20 16:51] evan_alert_bot BotFather, [29.10.20 16:51] Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot. evan lai, [29.10.20 16:51] evan_alert_bot BotFather, [29.10.20 16:51] Done! Congratulations on your new bot. You will find it at t.me/evan_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this. Use this token to access the HTTP API: 1363904888:AAGeUIoxxRMlxk9zHUa2MTRi1My9HDBP69w Keep your token secure and store it safely, it can be used by anyone to control your bot. For a description of the Bot API, see this page: https://core.telegram.org/bots/api
有用的信息
evan lai, [10.05.20 21:55] lxtx_prom_alert_bot BotFather, [10.05.20 21:55] Done! Congratulations on your new bot. You will find it at t.me/lxtx_prom_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this. Use this token to access the HTTP API: 1157710367:AAFD9YLsjdQ_t7botbVLa4xxWrOc9LVHNYc Keep your token secure and store it safely, it can be used by anyone to control your bot. For a description of the Bot API, see this page: https://core.telegram.org/bots/api 使用API/bottoken/API方法getMe获取自己的id curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7xxxxxLa4imWrOV9LVHNYc/getMe #前面有bot字母 sns:~# curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_xxxxotbVLa4imWrOV9LVHNYc/getMe {"ok":true,"result":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot","can_join_groups":true,"can_read_all_group_messages":false,"supports_inline_queries":false}}
创建组
获取群ID 在Telegram新建group,然后添加成员刚创建的机器人 (prom_alert_bot) ,调用API方法getUPdates获取群ID curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7botbVLa4imWrOV9LVHNYc/getUpdates {"ok":true,"result":[{"update_id":367831744, "message":{"message_id":1,"from":{"id":796717144,"is_bot":false,"first_name":"evan","last_name":"lai","username":"linuxsa"},"chat":{"id":-470646458,"title":"alerm","type":"group","all_members_are_administrators":true},"date":1597202656,"new_chat_participant":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_member":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_members":[{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"}]}}]}
telegram webhook
1.先把 webhook 跑起来
git clone https://github.com/evan886/alertmanager-webhook-telegram-python.git cd alertmanager-webhook-telegram-python/docker docker build -t alertmanager-webhook-telegram:1.0 . docker run -d --name telegram-bot \ -e "bottoken=1157710367:AxxxxxxQ_t7botbVLa4imWrOV9LVHNYc" \ -e "chatid=4706458" \ -e "username=evan" \ -e "password=evanLxx123" \ -p 9119:9119 alertmanager-webhook-telegram:1.0
配置
cat alertmanager/config.yml # 定义路由树信息,这个路由可以接收到所有的告警,还可以继续配置路由,比如project: zhidaoAPP(prometheus 告警规则中自定义的lable)发给谁,project: baoxian的发给谁 route: group_by: ['alertname'] # 报警分组依据 group_wait: 10s # 最初即第一次等待多久时间发送一组警报的通知 group_interval: 60s # 在发送新警报前的等待时间 repeat_interval: 1h # 发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝 receiver: 'telegram-webhook' # 发送警报的接收者的名称,以下receivers name的名称 # 定义警报接收者信息 receivers: - name: 'telegram-webhook' webhook_configs: - url: http://evan:[email protected]:9119/alert
结果查看
正常来说 这时候 你的TG 组就有信息了 ,没的话 就停止一个node export 收不到就有问题喽
trouble
起不来 老报错 level=error ts=2019-08-26T05:52:52.19072198Z caller=main.go:337 msg="Loading configuration file failed" file=/usr/local/prometheus/alertmanager/alertmanager.yml err="yaml: unmarshal errors:\n line 12: field receivers not found in type config.plain" 解决办法 用了聪的办法 - url: 'http://用户:密码@172.24.103.122:9119/alert'
bot see also
https://prometheus.io/docs/alerting/latest/configuration/
https://core.telegram.org/bots
How to Create a Telegram Bot ID/Chat ID
Prometheus Alertmanager Telegram Bot
https://github.com/inCaller/prometheus_bot
https://github.com/metalmatze/alertmanager-bot
基于prometheus + grafana + mysql + Telegram 监控告警
https://my.oschina.net/54188zz/blog/3030618
prometheus alertmanager telegram
This is the Alertmanager bot for Prometheus that notifies you on alerts.
https://github.com/metalmatze/alertmanager-bot
微信
webhook
prometheus alertmanager webhook 配置教程
prometheus+alertmanager+webhook实现自定义监控报警系统
* intro
告警能力在Prometheus的架构中被划分为两个部分,在Prometheus Server中定义告警规则以及产生告警,Alertmanager组件则用于处理这些由Prometheus产生的告警。Alertmanager即Prometheus体系中告警的统一处理中心。 Alertmanager提供了多种内置第三方告警通知方式,同时还提供了对Webhook通知的支持,通过Webhook用户可以完成对告警更多个性化的扩展。
* ins
** using docker or docker-composer
用自带的 compose
https://hub.docker.com/r/prom/alertmanager/dockerfile
- docker only
docker pull prom/alertmanager docker run --name alertmanager -d -p 9093:9093 -v /path/to/config.yml:/etc/alertmanager/conf/config.yml prom/alertmanager
* conf
rules vim node-up.rules groups: - name: node-up rules: - alert: node-up expr: up{job="node-exporter"} == 0 for: 15s labels: severity: 1 team: node annotations: summary: "{{ $labels.instance }} 已停止运行超过 15s!" 说明一下:该 rules 目的是监测 node 是否存活,expr 为 PromQL 表达式验证特定节点 job="node-exporter" 是否活着,for 表示报警状态为 Pending 后等待 15s 变成 Firing 状态,一旦变成 Firing 状态则将报警发送到 AlertManager,labels 和 annotations 对该 alert 添加更多的标识说明信息,所有添加的标签注解信息,以及 prometheus.yml 中该 job 已添加 label 都会自动添加到邮件内容中,更多关于 rule 详细配置可以参考 #告警解除
* 自定义告警规则
** CPU load 自定义告警规则
- alert: high_load-85per expr: (100-(avg(irate(node_cpu_seconds_total{mode="idle"}[5m]))by (job)) * 100) > 80 #expr: sum(avg without (cpu)(irate(node_cpu{mode!='idle'}[5m]))) by (instance) > 0.81 #expr: node_load1 > 0.2 for: 10m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} under high load" description: "{{ $labels.instance }} of job {{ $labels.job }} is under high load more than 12 minutes." FIRING 才会 send email
** 内存自定义告警规则
#rules file 注意空格在前面哦 - alert: hostMemUsageAlert expr: ((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90 #expr: (node_memory_MemTotal - node_memory_MemAvailable)/node_memory_MemTotal > 0.85 for: 1m labels: severity: page annotations: summary: "Instance {{ $labels.instance }} MEM usgae high" description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"
成功的自定义报警规则 2020 https://www.shared-code.com/article/84
这个成功的 上面的不成功
((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90
** 磁盘自定义告警
- alert: LowDiskSpaceNodeFilesystemUsage expr: 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80 for: 1m labels: severity: warning annotations: summary: "Instance {{ $labels.instance }} :{{ $labels.mountpoint }} 分区使用率过高" description: "{{ $labels.instance }} : {{ $labels.job }} :{{ $labels.mountpoint }} 这个分区使用大于百分之80% (当前值:{{ $value }})"
see also
新环境可能还要分组一下什么的
Docker系列——Grafana+Prometheus+Node-exporter服务器告警中心(二)
Prometheus监控 - Alertmanager报警模块
Prometheus监控node_exporter的告警规则
(坑爹错误)记录prometheus中配置alertmanager.yml一次报错
https://prometheus.io/docs/alerting/alertmanager/
good-prometheus + AlertManager 实现对多node节点CPU和内存信息的监控
prometheus通过node_exporter抓取的数据准确计算磁盘使用率