页面“Zabbix 5 4 and 3 LTS安装入门教程”与“Alertmanager”之间的差异

来自linux中国网wiki
(页面间的差异)
跳到导航 跳到搜索
 
 
第1行: 第1行:
==站内资源==
+
=*  my email =
[[Zabbix 调用API 批量添加主机]]
+
@126.com
== zabbix server ==
 
===pre ===
 
这些是官网官方安装文档,记得一定要看
 
  
https://www.zabbix.com/documentation/3.0/manual/installation/install
+
授权密码
 +
=telegram=
 +
通过 prome
  
https://www.zabbix.org/wiki/InstallOnCentOS_RHEL
+
这个有空看一下
 +
https://github.com/metalmatze/alertmanager-bot
 +
==*  创建 tg  机器和 报警组==
  
<pre>wget -c https://jaist.dl.sourceforge.net/project/zabbix/ZABBIX%20Latest%20Stable/3.0.8/zabbix-3.0.8.tar.gz
+
===** 创建机器 ===
wget -c http://tenet.dl.sourceforge.net/project/zabbix/ZABBIX%20Latest%20Stable/2.2.13/zabbix-2.2.13.tar.gz</pre>
 
  
===#配置php变量===
+
====*** 202011 创建bot的例子====
<pre>vi /etc/php.ini
+
<pre>
date.timezone = Asia/Shanghai
+
#2020
post_max_size = 32M
+
evan lai, [29.10.20 16:50]
max_execution_time = 300
+
/start
max_input_time = 300
 
注:更改完之后需要重启nginx和php</pre>
 
 
 
===#安装zabbix所需的组件 ===
 
<pre>yum -y install net-snmp-devel curl-devel
 
#yum -y install curl curl-devel net-snmp net-snmp-devel perl-DBI php-gd php-xml php-bcmath
 
groupadd zabbix && useradd -g zabbix zabbix
 
tar xvf zabbix-3.0.8.tar.gz && cd zabbix-3.0.8
 
#tar xvf zabbix-2.2.13.tar.gz && cd zabbix-2.2.13
 
  
#./configure –enable-server –enable-agent –with-mysql –enable-ipv6 –with-net-snmp –with-libcurl –with-libxml2
+
BotFather, [29.10.20 16:50]
 +
I can help you create and manage Telegram bots. If you're new to the Bot API, please see the manual (https://core.telegram.org/bots).
  
##或者默认安装路径 make是不用的
+
You can control me by sending these commands:
./configure --sysconfdir=/etc/zabbix --enable-server --enable-proxy --enable-agent --with-mysql=/usr/local/mysql/bin/mysql_config --with-net-snmp --with-libcurl --with-libxml2 && make install
 
</pre>
 
===添加zabbix服务对应的端口(可以省略,但是官方建议有)===
 
<pre>cat >>/etc/services<< EOF
 
zabbix-agent 10050/tcp #Zabbix Agent
 
zabbix-agent 10050/udp #Zabbix Agent
 
zabbix-trapper 10051/tcp #Zabbix Trapper
 
zabbix-trapper 10051/udp #Zabbix Trapper
 
EOF</pre>
 
  
===配置文件===
+
/newbot - create a new bot
<pre>##好像是这个起效果的呢 –sysconfdir=/etc/zabbix 有这个,不用下面的命令了
+
/mybots - edit your bots [beta]
#vim /usr/local/etc/zabbix_server.conf
 
#cd zabbix-2.0.7
 
  
#mkdir /etc/zabbix
+
Edit Bots
#cp conf/*.conf /etc/zabbix
+
/setname - change a bot's name
 +
/setdescription - change bot description
 +
/setabouttext - change bot about info
 +
/setuserpic - change bot profile photo
 +
/setcommands - change the list of commands
 +
/deletebot - delete a bot
  
mkdir /var/log/zabbix ;chown zabbix:zabbix /var/log/zabbix;
+
Bot Settings
 +
/token - generate authorization token
 +
/revoke - revoke bot access token
 +
/setinline - toggle inline mode (https://core.telegram.org/bots/inline)
 +
/setinlinegeo - toggle inline location requests (https://core.telegram.org/bots/inline#location-based-results)
 +
/setinlinefeedback - change inline feedback (https://core.telegram.org/bots/inline#collecting-feedback) settings
 +
/setjoingroups - can your bot be added to groups?
 +
/setprivacy - toggle privacy mode (https://core.telegram.org/bots#privacy-mode) in groups
  
 +
Games
 +
/mygames - edit your games (https://core.telegram.org/bots/games) [beta]
 +
/newgame - create a new game (https://core.telegram.org/bots/games)
 +
/listgames - get a list of your games
 +
/editgame - edit a game
 +
/deletegame - delete an existing game
  
#zabbix web代码
+
BotFather, [29.10.20 16:50]
mkdir -p /data/www/zabbix;
+
Alright, a new bot. How are we going to call it? Please choose a name for your bot.
cp -r frontends/php/* /data/www/zabbix
 
  
修改zabbix连接的数据库的用户名和密码
+
evan lai, [29.10.20 16:50]
vi /etc/zabbix/zabbix_server.conf
+
/newbot
  
DBHost=127.0.0.1
+
evan lai, [29.10.20 16:51]
DBName=zabbix
+
evan_alert_bot
DBUser=zabbix
 
DBPassword='123'
 
DBPort=3306 #如果数据库是用sock文件的方式,这里可以是sock文件的路径
 
  
添加数据库Lib文件位置到/etc/ld.so.conf中,并使其生效
+
BotFather, [29.10.20 16:51]
echo ‘/usr/local/mysql/lib/mysql/’ >> /etc/ld.so.conf
+
Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot.
  
ldconfig
+
evan lai, [29.10.20 16:51]
 +
evan_alert_bot
  
为zabbix的启动、关闭和重启的脚本文件做链接,方便系统可以找得到
+
BotFather, [29.10.20 16:51]
 +
Done! Congratulations on your new bot. You will find it at t.me/evan_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.
  
给zabbix服务端程序做软链接 我用默认的config 应该是不用的
+
Use this token to access the HTTP API:
cd /usr/local/zabbix/bin/
+
1363904888:AAGeUIoxxRMlxk9zHUa2MTRi1My9HDBP69w
for i in *;do ln -s /usr/local/zabbix/bin/${i} /usr/bin/${i};done
+
Keep your token secure and store it safely, it can be used by anyone to control your bot.
cd /usr/local/zabbix/sbin/
 
for i in *;do ln -s /usr/local/zabbix/sbin/${i} /usr/sbin/${i};done
 
  
拷贝zabbix服务端和客户端启动脚本到/etc/init.d目录下.
+
For a description of the Bot API, see this page: https://core.telegram.org/bots/api
cd misc/init.d/
+
</pre>
cp fedora/core/zabbix_server /etc/init.d/
 
cp fedora/core/zabbix_agentd /etc/init.d/
 
chmod +x /etc/init.d/zabbix_agentd
 
chmod +x /etc/init.d/zabbix_server</pre>
 
 
 
===3 Create Zabbix database===
 
<pre>SQL scripts are provided for creating database schema and inserting the dataset
 
#https://www.zabbix.com/documentation/3.0/manual/appendix/install/db_scripts
 
#https://www.zabbix.com/documentation/2.2/manual/appendix/install/db_scripts
 
 
 
mysql>create database zabbix character set utf8 collate utf8_bin;;grant all on zabbix.* to zabbix@localhost identified by '123';flush privileges;
 
 
 
将zabbix源码包中的数据导入到新建的zabbix数据库
 
##这个和老的版本有点不同 第一个是zabbix的数据库表结构,要先导入。
 
  
. /etc/profile
+
====有用的信息 ====
cd ../..
 
mysql -uroot -p'evan' zabbix< database/mysql/schema.sql
 
mysql -uroot -p'evan' zabbix< database/mysql/images.sql
 
mysql -uroot -p'evan' zabbix< database/mysql/data.sql
 
 
 
vi /etc/init.d/zabbix_server # 的可以不改,修改一下变量的值 因为我是默认用 configure
 
# base zabbix dir
 
BASEDIR=/usr/local
 
# binary file
 
ZABBIX_SUCKERD=$BASEDIR/sbin/zabbix_server<pre>
 
 
 
===fping的安装和使用详解===
 
<pre>http://rickie622.blog.163.com/blog/static/2123881120121121111720941/
 
http://netsecurity.51cto.com/art/201101/242200.htm
 
#当然 下载zip包也是可以的
 
git clone https://github.com/schweikert/fping.git
 
cd fping
 
./autogen.sh
 
./configure
 
make -j2 && make install
 
 
 
#修改一下配置文件
 
vim /etc/zabbix_server.conf
 
#vim /usr/local/etc/zabbix_server.conf
 
FpingLocation=/usr/local/sbin/fping</pre>
 
 
 
===启动zabbix,并且添加开机自启动===
 
<pre>service zabbix_server start
 
service zabbix_agentd start #启动服务
 
 
 
chkconfig zabbix_server on
 
chkconfig zabbix_agentd on #开机自启动
 
 
 
在Nginx服务中添加zabbix虚拟主机
 
#vim /usr/local/nginx/conf/vhosts/monitor.conf
 
 
 
这些都不要,不然没有web安装向导的
 
#cd zabbix/conf
 
#cp zabbix.conf.php.example zabbix.conf.php ;
 
#chmod 777 zabbix.conf.php
 
 
 
iptables -I INPUT -p tcp –dport 80 -j ACCEPT
 
 
 
zabbix server is not running
 
Zabbix Server is not running: the information displayed may not be current
 
http://song49.blog.51cto.com/4480450/1200151</pre>
 
 
 
===(4)设置zabbix服务IP和端口,name可以忽略===
 
 
<pre>
 
<pre>
ps:
 
post_max_size = 16M
 
PHP option “max_execution_time” 30 300 Fail
 
PHP option “max_input_time” 60 300 Fail
 
PHP option “date.timezone” unknown Fai
 
date.timezone = Asia/Shanghai
 
  
PHP option “always_populate_raw_post_data” must be set to “-1”
 
  
port 10051
+
evan lai, [10.05.20 21:55]
 +
lxtx_prom_alert_bot
  
Zabbix frontend is ready! The default user name is Admin, password zabbix.</pre>
+
BotFather, [10.05.20 21:55]
===超级用户密码修改 ===
+
Done! Congratulations on your new bot. You will find it at t.me/lxtx_prom_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.
<pre>
 
2.2 用户名第一个字母要大写 也就是Admin
 
默认的用户名:admin 密码:zabbix
 
  
哪里改登录用户和密码呢
+
Use this token to access the HTTP API:
use zabbix;
+
1157710367:AAFD9YLsjdQ_t7botbVLa4xxWrOc9LVHNYc
select userid,alias,passwd from users; #查看
+
Keep your token secure and store it safely, it can be used by anyone to control your bot.
  
+--------+------------+----------------------------------+
+
For a description of the Bot API, see this page: https://core.telegram.org/bots/api
| userid | alias      | passwd                          |
 
+--------+------------+----------------------------------+
 
|      1 | Admin      | 5fce1b3e34b520afeffb37ce08c7cd66 |
 
  
  
#如果为zabbix 3.0  直接这样就行了
+
使用API/bottoken/API方法getMe获取自己的id
update users set passwd=MD5('12345') where userid=1;
 
  
  
#zabbix 2.x
+
curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7xxxxxLa4imWrOV9LVHNYc/getMe
重新开个终端,生成一个MD5加密的密码,这里密码设置的是redhat
 
  
[root@localhost ~]# echo -n 12345678 |openssl md5 #-n就表示不输入回车符,不加-n,否则就不是这个结果了。
 
(stdin)= 25d55ad283aa400af464c76d713c07ad
 
  
接着上面的为admin用户设定一个密码
+
#前面有bot字母
 +
sns:~# curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_xxxxotbVLa4imWrOV9LVHNYc/getMe
 +
{"ok":true,"result":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot","can_join_groups":true,"can_read_all_group_messages":false,"supports_inline_queries":false}}
  
mysql> update users set passwd='25d55ad283aa400af464c76d713c07ad' where userid = '1';
+
</pre>
#或者直接使用update users set passwd=md5(“12345678”) where userid=’1′;
 
Query OK, 1 row affected (0.01 sec)
 
Rows matched: 1 Changed: 1 Warnings: 0
 
 
 
mysql> flush privileges;
 
Query OK, 0 rows affected (0.01 sec)
 
 
 
mysql> quit
 
Bye
 
 
 
zabbix登陆账户admin密码修改
 
http://pvbutler.blog.51cto.com/7662323/1734003
 
 
 
yum install ntp ntpdate -y
 
chkconfig ntpd on
 
/etc/init.d/ntpd start
 
 
 
*/30 * * * * /usr/sbin/ntpdate pool.ntp.org
 
 
 
在29行这后添加
 
sed -i ’29a user=mysql’ /etc/my.cnf
 
sed -i ’29a character-set-server=utf8′ /etc/my.cnf
 
sed -i ’29ainnodb_file_per_table=1′ /etc/my.cnf
 
重启mysqld</pre>
 
  
===防火墙设置 ===
+
=== 创建组===
<pre>这个要看一下先,尽量用严格些的防火墙设置
 
 
 
#on zabbix-agent
 
 
 
iptables -A INPUT -s zabbixserverip  -p tcp -m tcp --dport 10050 -m comment --comment "zabbix_server listen " -j ACCEPT
 
#iptables -A INPUT -s zabbixserverip  -p tcp -m tcp --dport 10050 -m comment --comment "zabbix_agentd listen " -j ACCEPT
 
 
 
#这下面的防火墙rule 不要用
 
vi /etc/sysconfig/iptables
 
-A INPUT -m state –state NEW -m tcp -p tcp –dport 22 -j ACCEPT
 
-A INPUT -m state –state NEW -m tcp -p tcp –dport 80 -j ACCEPT
 
-A INPUT -m state –state NEW -m tcp -p tcp –dport 10050 -j ACCEPT
 
-A INPUT -m state –state NEW -m tcp -p tcp –dport 10051 -j ACCEPT
 
 
 
/etc/init.d/iptables restart
 
 
 
中文在右上角的用户里面哦
 
 
 
Starting php_fpm /usr/local/php/bin/php-cgi: error while loading shared libraries: libiconv.so.2: cannot open shared object file: No such file or directory
 
failed
 
 
 
by default install the daemon binaries (zabbix_server, zabbix_agentd, zabbix_proxy) in /usr/local/sbin and the client binaries (zabbix_get, zabbix_sender) in /usr/local/bin.</pre>
 
 
 
===4.0 5.2 图像显示字体乱码的解决方法===
 
 
<pre>
 
<pre>
 +
获取群ID
  
1.替换支持中文的字体
+
在Telegram新建group,然后添加成员刚创建的机器人 (prom_alert_bot) ,调用API方法getUPdates获取群ID
#传输字体
 
#50fbe46d4c5c        zabbix/zabbix-web-nginx-pgsql:alpine-5.2-latest
 
docker cp simkai.ttf 50fbe46d4c5c:/usr/share/zabbix/assets/fonts
 
  
  cp  /root/STKAITI.TTF  /usr/share/zabbix/assets/fonts
+
  curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7botbVLa4imWrOV9LVHNYc/getUpdates
 
+
{"ok":true,"result":[{"update_id":367831744,
2.修改字体配置 php文件
+
"message":{"message_id":1,"from":{"id":796717144,"is_bot":false,"first_name":"evan","last_name":"lai","username":"linuxsa"},"chat":{"id":-470646458,"title":"alerm","type":"group","all_members_are_administrators":true},"date":1597202656,"new_chat_participant":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_member":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_members":[{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"}]}}]}
#修改 指定的字体 或者直接把你的家体改成和默认同名,而默认的那个先删除 连重启都不用 反正在docker
 
  grep  -rn  BX_GRAPH_FONT_NAM  /usr/share/zabbix/include/defines.inc.php
 
67:define('ZBX_GRAPH_FONT_NAME', 'DejaVuSans'); // font file name
 
 
 
修改
 
sed  -i  's!DejaVuSans!simkai!' include/defines.inc.php
 
define('ZBX_GRAPH_FONT_NAME',           'simkai'); 
 
 
</pre>
 
</pre>
  
===zabbix-get===
+
==telegram webhook ==
 +
=== 1.先把 webhook 跑起来 ===
 
<pre>
 
<pre>
root@zabbix-server ~]#zabbix_get  -s 10.3.10.139 -k "system.hostname"
 
dev-hello-market
 
  
  
不过使用zabbix_get时必须开启客户端被动模式,要求暴露客户端监听端口。
+
git clone https://github.com/evan886/alertmanager-webhook-telegram-python.git
 +
cd  alertmanager-webhook-telegram-python/docker
 +
docker build -t alertmanager-webhook-telegram:1.0 .
 +
docker run -d --name telegram-bot \
 +
-e "bottoken=1157710367:AxxxxxxQ_t7botbVLa4imWrOV9LVHNYc" \
 +
-e "chatid=4706458" \
 +
-e "username=evan" \
 +
-e "password=evanLxx123" \
 +
-p 9119:9119 alertmanager-webhook-telegram:1.0
 
</pre>
 
</pre>
[https://blog.csdn.net/cx55887/article/details/83818696 自动化监控--zabbix-get安装使用详解]
 
  
==第二 agent==
+
==== 配置 ====
 
<pre>
 
<pre>
 +
cat alertmanager/config.yml
  
#4.0 #centos7 快速安装和自动配置 2019年 8月23日 星期五 11时45分01秒 CST
+
# 定义路由树信息,这个路由可以接收到所有的告警,还可以继续配置路由,比如project: zhidaoAPP(prometheus 告警规则中自定义的lable)发给谁,project: baoxian的发给谁
 +
route:
 +
  group_by: ['alertname'] # 报警分组依据
 +
  group_wait: 10s        # 最初即第一次等待多久时间发送一组警报的通知
 +
  group_interval: 60s    # 在发送新警报前的等待时间
 +
  repeat_interval: 1h    # 发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
 +
  receiver: 'telegram-webhook'      # 发送警报的接收者的名称,以下receivers name的名称
  
#国外
+
# 定义警报接收者信息
rpm -ivh http://repo.zabbix.com/zabbix/4.0/rhel/7/x86_64/zabbix-release-4.0-1.el7.noarch.rpm
+
receivers:
 +
  - name: 'telegram-webhook'
 +
    webhook_configs:
 +
    - url: http://evan:evanL23@172.31.24.19:9119/alert
  
#国内
+
</pre>
https://mirrors.aliyun.com/zabbix/zabbix/4.0/rhel/7/x86_64/zabbix-release-4.0-1.el7.noarch.rpm
 
 
 
手工直接添加  repo文件 ,如果有时 不小心 像上次删除了 reop文件 一样 ,搞半天
 
  
cat <<EOF > /etc/yum.repos.d/zabbix.repo
+
=== 结果查看===
[zabbix]
+
  正常来说 这时候 你的TG 组就有信息了 ,没的话 就停止一个node export  收不到就有问题喽
name=Zabbix Official Repository - \$basearch
 
baseurl=https://mirrors.aliyun.com/zabbix/zabbix/4.0/rhel/7/\$basearch/
 
enabled=1
 
gpgcheck=1
 
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-ZABBIX-A14FE591
 
   
 
[zabbix-non-supported]
 
name=Zabbix Official Repository non-supported - \$basearch
 
baseurl=https://mirrors.aliyun.com/zabbix/non-supported/rhel/7/\$basearch/
 
enabled=1
 
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-ZABBIX
 
gpgcheck=1
 
EOF
 
  
curl https://mirrors.aliyun.com/zabbix/RPM-GPG-KEY-ZABBIX-A14FE591 \
+
== trouble==
-o /etc/pki/rpm-gpg/RPM-GPG-KEY-ZABBIX-A14FE591
+
起不来 老报错 level=error ts=2019-08-26T05:52:52.19072198Z caller=main.go:337 msg="Loading configuration file failed" file=/usr/local/prometheus/alertmanager/alertmanager.yml err="yaml: unmarshal errors:\n  line 12: field receivers not found in type config.plain"  解决办法 用了聪的办法  - url: 'http://用户:密码@172.24.103.122:9119/alert'
 
curl https://mirrors.aliyun.com/zabbix/RPM-GPG-KEY-ZABBIX \
 
-o /etc/pki/rpm-gpg/RPM-GPG-KEY-ZABBIX
 
  
 +
== bot  see also==
 +
https://prometheus.io/docs/alerting/latest/configuration/
  
 +
https://core.telegram.org/bots
  
yum install zabbix-agent -y
+
[https://techsoftcenter.com/how-to-create-a-telegram-bot-id-chat-id/ How to Create a Telegram Bot ID/Chat ID]
  
yum install ntp  -y
+
[https://toolbox.kali-linuxtr.net/prometheus-alertmanager-telegram-bot.tool Prometheus Alertmanager Telegram Bot]
timedatectl set-ntp true
 
  
HOSTNAME=prod-java-02
+
[https://www.cnblogs.com/KillBugMe/p/13140226.html 创建telegram 机器人 并发送消息]
  
#config
+
[https://www.teleme.io/articles/create_your_own_telegram_bot?hl=zh-hans 如何创建我自己的电报机器人(Telegram Bot)]
sed -i "s/^Server=127.0.0.1/Server=172.16.1.9/ " /etc/zabbix/zabbix_agentd.conf
 
  
sed -i "s/^ServerActive=127.0.0.1/ServerActive=172.16.1.9/"  /etc/zabbix/zabbix_agentd.conf
+
[https://nova.moe/manage-host-alert-on-telegram-with-grafana/ 在 Telegram 中管理主机监控和警报信息]
sed  -i "s/^Hostname=Zabbix server/Hostname=test-market/"  /etc/zabbix/zabbix_agentd.conf
 
  
 +
https://github.com/inCaller/prometheus_bot
  
 +
https://github.com/metalmatze/alertmanager-bot
  
#这个用了HOSTNAME 变量  而上面的要指定hostname
+
[https://blog.csdn.net/weixin_34242331/article/details/91875514 基于prometheus + grafana + mysql + Telegram 监控告警]
sed  -i 's/127.0.0.1/23.67.81.95/g'  /etc/zabbix/zabbix_agentd.conf
 
sed -i "s/Hostname=Zabbix server/Hostname=${HOSTNAME}/g"  /etc/zabbix/zabbix_agentd.conf
 
grep "^\s*[^# \t].*$" /etc/zabbix/zabbix_agentd.conf
 
  
systemctl  enable  zabbix-agent.service
+
https://my.oschina.net/54188zz/blog/3030618
systemctl restart zabbix-agent
 
  
 +
[https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/prometheus-alert-rule 自定义Prometheus告警规则]
  
下面是老的信息 和解说
+
[https://www.linux.org.ru/forum/general/14894302  prometheus alertmanager telegram ]
  
cat /etc/zabbix/zabbix_agentd.conf
+
[https://www.cnblogs.com/wangxu01/articles/11654836.html 部署Alertmanager实现邮件/钉钉/微信报警]
Hostname=主机名
 
Server=zabbix server ip
 
LogFile= 可以不改
 
  
##最好这样3个
+
[https://www.cnblogs.com/xiaobaozi-95/p/10740511.html prometheus告警插件-alertmanager]
Server=10.6.1.181
 
ServerActive=10.6.1.181
 
Hostname=zabbix-client-1
 
  
  
#rpm -ivh http://repo.zabbix.com/zabbix/3.0/rhel/6/x86_64/zabbix-release-3.0-1.el6.noarch.rpm
 
#rpm -ivh http://repo.zabbix.com/zabbix/2.2/rhel/6/x86_64/zabbix-release-2.2-1.el6.noarch.rpm
 
#http://repo.zabbix.com/zabbix/2.0/rhel/5/x86_64/zabbix-release-2.0-1.el5.noarch.rpm
 
  
 +
[https://github.com/metalmatze/alertmanager-bot This is the Alertmanager bot for Prometheus that notifies you on alerts.]
  
rpm -ivh http://repo.zabbix.com/zabbix/3.0/rhel/7/x86_64/zabbix-release-3.0-1.el7.noarch.rpm
 
  
</pre>
 
===agent on debian===
 
<pre>
 
#用系统的源
 
apt-get install zabbix-agent
 
  
 +
https://github.com/metalmatze/alertmanager-bot
  
#config 其实和yum的也一样 只是启动不一样而已
 
HOSTNAME=wiki
 
sed -i "s/^Server=127.0.0.1/Server=207.148.106.229/ " /etc/zabbix/zabbix_agentd.conf
 
  
sed -i "s/^ServerActive=127.0.0.1/ServerActive=207.148.106.229//etc/zabbix/zabbix_agentd.conf
+
[https://www.cnblogs.com/longcnblogs/p/9620733.html Prometheus 和 Alertmanager实战配置]
sed  -i "s/^Hostname=Zabbix server/Hostname=wiki/" /etc/zabbix/zabbix_agentd.conf
 
  
grep "^\s*[^# \t].*$" /etc/zabbix/zabbix_agentd.conf
+
== 微信==
  
service zabbix-agent start
+
[https://blog.csdn.net/knight_zhou/article/details/106937276  Prometheus 微信告警注意事项]
 +
=webhook=
  
  
zabbix_get -s 138.68.59.0 -k "system.hostname"
+
[https://blog.csdn.net/shida_csdn/article/details/81980021  prometheus alertmanager webhook 配置教程]
  
 +
[https://blog.csdn.net/bluuusea/article/details/104619235  prometheus+alertmanager+webhook实现自定义监控报警系统]
  
#用zbx的源
+
=* intro =
  https://repo.zabbix.com/zabbix/4.0/debian/pool/main/z/zabbix-release/zabbix-release_4.0-3+buster_all.deb       
+
告警能力在Prometheus的架构中被划分为两个部分,在Prometheus Server中定义告警规则以及产生告警,Alertmanager组件则用于处理这些由Prometheus产生的告警。Alertmanager即Prometheus体系中告警的统一处理中心。
dpkg -i zabbix-release_stretch_all.deb
+
Alertmanager提供了多种内置第三方告警通知方式,同时还提供了对Webhook通知的支持,通过Webhook用户可以完成对告警更多个性化的扩展。
# apt-get update
 
</pre>
 
https://www.zabbix.com/documentation/3.2/manual/installation/install_from_packages/repository_installation
 
  
https://www.zabbix.com/documentation/4.0/zh/manual/installation/install_from_packages/rhel_centos#%E5%AE%89%E8%A3%85_agent
+
=* ins=
 +
==** using docker or docker-composer==
  
 +
用自带的 compose
  
国内zabbix源总结
+
https://hub.docker.com/r/prom/alertmanager/dockerfile
 +
*** docker only
 +
  docker pull prom/alertmanager
 +
  docker run --name alertmanager  -d -p 9093:9093  -v /path/to/config.yml:/etc/alertmanager/conf/config.yml prom/alertmanager
  
https://www.cnblogs.com/caidingyu/p/11423089.html
 
  
https://blog.cactifans.com/2019/01/21/Zabbix%E5%9B%BD%E5%86%85%E6%BA%90%E4%BD%BF%E7%94%A8/
 
  
==docker zabbix 5.0==
 
  
[https://juejin.im/entry/57be598d0a2b58006cd17c0f 用 Zabbix 和 Docker 搭建监控平台]
+
=* conf =
 +
<pre>
 +
rules
  
  
[https://outmanzzq.github.io/2018/08/12/zabbix-docker/ Docker Compose 快速搭建zabbix监控系统]
+
vim node-up.rules
 +
groups:
 +
- name: node-up
 +
  rules:
 +
  - alert: node-up
 +
    expr: up{job="node-exporter"} == 0
 +
    for: 15s
 +
    labels:
 +
      severity: 1
 +
      team: node
 +
    annotations:
 +
      summary: "{{ $labels.instance }} 已停止运行超过 15s!"
  
我的没有 zabbix web ui 明天搞一下
+
说明一下:该 rules 目的是监测 node 是否存活,expr 为 PromQL 表达式验证特定节点 job="node-exporter" 是否活着,for 表示报警状态为 Pending 后等待 15s 变成 Firing 状态,一旦变成 Firing 状态则将报警发送到 AlertManager,labels 和 annotations 对该 alert 添加更多的标识说明信息,所有添加的标签注解信息,以及 prometheus.yml 中该 job 已添加 label 都会自动添加到邮件内容中,更多关于 rule 详细配置可以参考
暴露端口 8888 用于访问页面,10051 用于和 Zabbix-agent 通信;
 
  
[https://www.zabbix.com/cn/whats_new_5_0 Zabbix 5.0 LTS新功能]
+
#告警解除
  
[https://www.cnblogs.com/itzgr/p/9963156.html  011.Docker Compose部署Zabbix实战 ]
+
</pre>
  
[https://www.cnblogs.com/rongfengliang/p/12925792.html zabbix docker-compose 运行配置 ]
+
=* 自定义告警规则=
 +
==** CPU load 自定义告警规则==
 +
<pre>
 +
  - alert: high_load-85per
 +
    expr: (100-(avg(irate(node_cpu_seconds_total{mode="idle"}[5m]))by (job)) * 100)  > 80
 +
    #expr: sum(avg without (cpu)(irate(node_cpu{mode!='idle'}[5m]))) by (instance) > 0.81
 +
    #expr: node_load1 > 0.2
 +
    for: 10m
 +
    labels:
 +
      severity: page
 +
    annotations:
 +
      summary: "Instance {{ $labels.instance }} under high load"
 +
      description: "{{ $labels.instance }} of job {{ $labels.job }} is under high load more than 12 minutes."
  
==Usage==
+
  FIRING 才会 send email
  添加主机要添加 模板  不然 在监控--主机那 图形没东西
+
</pre>
https://www.zabbix.com/documentation/current/manual/quickstart/host
+
==** 内存自定义告警规则==
 
+
<pre>#rules file 注意空格在前面哦
==添加用户 和用户组==
+
- alert: hostMemUsageAlert
 
+
    expr: ((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90
添加用户前 要添加用户组 然后在里面指定用户是哪个组 看到哪些项目
+
    #expr: (node_memory_MemTotal - node_memory_MemAvailable)/node_memory_MemTotal > 0.85
 
+
    for: 1m
[https://www.cnblogs.com/fanlong0212/p/12248049.html  zabbix4.4新建用户组和用户权限设置 ]
+
    labels:
 
+
      severity: page
[https://www.zabbix.com/documentation/4.0/zh/manual/quickstart/login 登陆和配置用户]
+
    annotations:
 
+
      summary: "Instance {{ $labels.instance }} MEM usgae high"
 
+
      description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"
=Zabbix配置Telegram告警=
 
<pre>
 
  
  
 
</pre>
 
</pre>
 +
成功的自定义报警规则 2020
 +
https://www.shared-code.com/article/84
  
https://www.zabbix.com/cn/integrations/telegram#tab:official1
+
这个成功的 上面的不成功
 
 
[https://blog.csdn.net/weixin_33699914/article/details/92336106  配置zabbix+telegram告警]
 
  
[https://www.cnblogs.com/yeyu1314/p/10071279.html  zabbix+telegram的API接口(告警) ]
+
((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90
  
[https://blog.51cto.com/13555423/2469571 Zabbix配置Telegram告警(无坑文档)]
+
[https://www.shared-code.com/article/84  常用prometheus告警规则模板(三]
  
[https://my.oschina.net/u/4302302/blog/3830481 zabbix 用Telegram报警!!!]
 
  
[https://www.cnblogs.com/yeyu1314/p/10071279.html  zabbix+telegram的API接口(告警) ]
+
[https://www.bookstack.cn/read/prometheus-book/alert-prometheus-alert-rule.md 自定义Prometheus告警规则]
  
= 故障及回顾=
+
==** 磁盘自定义告警==
 
<pre>
 
<pre>
问题1zabbix server 没有打开 10051端口 前端图形没显示
+
- alert: LowDiskSpaceNodeFilesystemUsage
 
+
    expr: 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
版本zabbix 2.2
+
    for: 1m
 +
    labels:
 +
      severity: warning
 +
    annotations:
 +
      summary: "Instance {{ $labels.instance }} :{{ $labels.mountpoint }} 分区使用率过高"
 +
      description: "{{ $labels.instance  }} : {{ $labels.job  }} :{{ $labels.mountpoint  }} 这个分区使用大于百分之80% (当前值:{{ $value }})"
 +
</pre>
  
 +
=see also=
 +
新环境可能还要分组一下什么的
  
 +
[https://blog.csdn.net/y_xiao_/article/details/50818451?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.add_param_isCf&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.add_param_isCf  Prometheus监控 - Alertmanager报警模块]
  
没有看到 10051 是因为 
+
[https://my.oschina.net/OutOfMemory/blog/4706596 Prometheus监控告警浅析]
  
DBPassword=‘123’ 改为 DBPassword=123
+
[https://www.cnblogs.com/winstom/p/11940570.html Alertmanager 部署配置]
  
 +
[https://blog.51cto.com/lookingdream/2504572 Prometheus监控node_exporter的告警规则]
  
日志查看
 
tail  /tmp/zabbix_server.log
 
  
14659:20170525:171042.257 [Z3001] connection to database 'zabbix' failed: [1045] Access denied for user 'zabbix'@'localhost' (using password: YES)
+
[https://juejin.im/post/6844903880778579976 Prometheus学习系列(三十九)之报警模板例子 ]
  
 +
https://prometheus.io/docs/alerting/alertmanager/
  
正确如下
+
[https://www.jianshu.com/p/239b145e2acc Prometheus Alertmanager报警组件]
[root@ zabbix]# netstat  -nlpt
 
Active Internet connections (only servers)
 
Proto Recv-Q Send-Q Local Address              Foreign Address            State      PID/Program name 
 
tcp        0      0 0.0.0.0:10051
 
</pre>
 
=== 经常报Zabbix agent on Zabbix server is unreachable for 5 minutes ===
 
<pre>
 
 
 
我上次的处理是 重启 zbx server 就好了
 
 
 
</pre>
 
  
[https://blog.csdn.net/weixin_34226706/article/details/85080769 解决Zabbix使用一段时间后总报Zabbix Agent不可到达的问题]
+
[https://blog.csdn.net/qq_25178661/article/details/86690729 good-prometheus + AlertManager 实现对多node节点CPU和内存信息的监控]
  
[https://blog.csdn.net/weixin_33721344/article/details/92968417 防火墙导致 zabbix监控大批量报警zabbix agent on **** unreachable for 5 minute]
+
[https://blog.csdn.net/kozazyh/article/details/80636512  prometheus-常用的监控告警规则]
  
==参考==
+
[https://blog.51cto.com/jerrymin/2333824  Prometheus配合Alertmanager报警系统]
  
[https://www.howtoforge.com/tutorial/install-zabbix-monitoring-server-and-agent-on-debian-9/ Install Zabbix Monitoring Server and Agent on Debian]
+
[https://www.cnblogs.com/longcnblogs/p/9620733.html Prometheus 和 Alertmanager实战配置]
  
[https://www.cnblogs.com/yanjieli/p/13651859.html  zabbix--5.0.2部署使用手册 ]
+
[https://www.kancloud.cn/huyipow/prometheus/527563 alertmanager报警规则详解]
  
[https://blog.csdn.net/weixin_42743410/article/details/81482728  zabbix 3.0使用教程]
 
  
[http://blog.51cto.com/guoxh/2089204 Zabbix 3.0 详解:从添加主机到发送报警通知]
+
[https://blog.csdn.net/wang725/article/details/94174331  prometheus - 监控磁盘]
  
[https://my.oschina.net/zhouyuntai/blog/1788830 Zabbix监控系统 (3) 之 添加自定义监控项目、配置邮件告警、测试告警]
+
[https://blog.csdn.net/mnasd/article/details/86694412  Prometheus自定义监控部署]
  
http://blog.linuxchina.net/?p=1711
+
[https://www.ctolib.com/docs/sfile/prometheus-book/alert/prometheus-alert-rule.html 自定义Prometheus告警规则]
  
[https://www.cnblogs.com/enjoycode/p/zabbix_3_installation_on_centos_7.html  Zabbix 3.0 with apache安装笔记]
+
[https://blog.csdn.net/weixin_33827731/article/details/92947113?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase 监控指标以及prometheus规则-不断完善中]
  
[https://www.cnblogs.com/zhenglisai/p/6547402.html 【zabbix】自定义监控项key值]
+
[https://www.cnblogs.com/xiangsikai/p/11290000.html Prometheus 编写告警规则案例]
  
 +
[https://www.jianshu.com/p/1f05476ebcee 使用prometheus自定义监控]
  
[https://blog.csdn.net/zhengchaooo/article/details/79499991 zabbix添加自定义py脚本]
+
[https://blog.csdn.net/chubi7812/article/details/100612951?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase  prometheus通过node_exporter抓取的数据准确计算磁盘使用率]
  
[https://juejin.cn/entry/6844903442452856845 用 Zabbix 和 Docker 搭建监控平台]
+
=k8s =
  
 [[category:zabbix]]
+
[https://www.qikqiak.com/post/alertmanager-of-prometheus-in-practice/ Prometheus报警AlertManager实战]
 +
[[category:ops]] [[category:container]] [[category:prom]]

2020年12月10日 (四) 08:03的版本

* my email

@126.com

授权密码

telegram

通过 prome 

这个有空看一下 https://github.com/metalmatze/alertmanager-bot

* 创建 tg 机器和 报警组

** 创建机器

*** 202011 创建bot的例子

#2020
evan lai, [29.10.20 16:50]
/start

BotFather, [29.10.20 16:50]
I can help you create and manage Telegram bots. If you're new to the Bot API, please see the manual (https://core.telegram.org/bots).

You can control me by sending these commands:

/newbot - create a new bot
/mybots - edit your bots [beta]

Edit Bots
/setname - change a bot's name
/setdescription - change bot description
/setabouttext - change bot about info
/setuserpic - change bot profile photo
/setcommands - change the list of commands
/deletebot - delete a bot

Bot Settings
/token - generate authorization token
/revoke - revoke bot access token
/setinline - toggle inline mode (https://core.telegram.org/bots/inline)
/setinlinegeo - toggle inline location requests (https://core.telegram.org/bots/inline#location-based-results)
/setinlinefeedback - change inline feedback (https://core.telegram.org/bots/inline#collecting-feedback) settings
/setjoingroups - can your bot be added to groups?
/setprivacy - toggle privacy mode (https://core.telegram.org/bots#privacy-mode) in groups

Games
/mygames - edit your games (https://core.telegram.org/bots/games) [beta]
/newgame - create a new game (https://core.telegram.org/bots/games)
/listgames - get a list of your games
/editgame - edit a game
/deletegame - delete an existing game

BotFather, [29.10.20 16:50]
Alright, a new bot. How are we going to call it? Please choose a name for your bot.

evan lai, [29.10.20 16:50]
/newbot

evan lai, [29.10.20 16:51]
evan_alert_bot

BotFather, [29.10.20 16:51]
Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot.

evan lai, [29.10.20 16:51]
evan_alert_bot

BotFather, [29.10.20 16:51]
Done! Congratulations on your new bot. You will find it at t.me/evan_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.

Use this token to access the HTTP API:
1363904888:AAGeUIoxxRMlxk9zHUa2MTRi1My9HDBP69w
Keep your token secure and store it safely, it can be used by anyone to control your bot.

For a description of the Bot API, see this page: https://core.telegram.org/bots/api

有用的信息



evan lai, [10.05.20 21:55]
lxtx_prom_alert_bot

BotFather, [10.05.20 21:55]
Done! Congratulations on your new bot. You will find it at t.me/lxtx_prom_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.

Use this token to access the HTTP API:
1157710367:AAFD9YLsjdQ_t7botbVLa4xxWrOc9LVHNYc
Keep your token secure and store it safely, it can be used by anyone to control your bot.

For a description of the Bot API, see this page: https://core.telegram.org/bots/api


使用API/bottoken/API方法getMe获取自己的id


curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7xxxxxLa4imWrOV9LVHNYc/getMe


#前面有bot字母 
sns:~# curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_xxxxotbVLa4imWrOV9LVHNYc/getMe
{"ok":true,"result":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot","can_join_groups":true,"can_read_all_group_messages":false,"supports_inline_queries":false}}

创建组

获取群ID

在Telegram新建group,然后添加成员刚创建的机器人 (prom_alert_bot) ,调用API方法getUPdates获取群ID

 curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7botbVLa4imWrOV9LVHNYc/getUpdates
{"ok":true,"result":[{"update_id":367831744,
"message":{"message_id":1,"from":{"id":796717144,"is_bot":false,"first_name":"evan","last_name":"lai","username":"linuxsa"},"chat":{"id":-470646458,"title":"alerm","type":"group","all_members_are_administrators":true},"date":1597202656,"new_chat_participant":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_member":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_members":[{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"}]}}]}

telegram webhook

1.先把 webhook 跑起来



git clone https://github.com/evan886/alertmanager-webhook-telegram-python.git
cd   alertmanager-webhook-telegram-python/docker 
docker build -t alertmanager-webhook-telegram:1.0 .
docker run -d --name telegram-bot \
	-e "bottoken=1157710367:AxxxxxxQ_t7botbVLa4imWrOV9LVHNYc" \
	-e "chatid=4706458" \
	-e "username=evan" \
	-e "password=evanLxx123" \
	-p 9119:9119 alertmanager-webhook-telegram:1.0

配置

cat alertmanager/config.yml

# 定义路由树信息,这个路由可以接收到所有的告警,还可以继续配置路由,比如project: zhidaoAPP(prometheus 告警规则中自定义的lable)发给谁,project: baoxian的发给谁
route:
  group_by: ['alertname'] # 报警分组依据
  group_wait: 10s         # 最初即第一次等待多久时间发送一组警报的通知
  group_interval: 60s     # 在发送新警报前的等待时间
  repeat_interval: 1h     # 发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
  receiver: 'telegram-webhook'       # 发送警报的接收者的名称,以下receivers name的名称

# 定义警报接收者信息
receivers:
  - name: 'telegram-webhook'
    webhook_configs:
    - url: http://evan:[email protected]:9119/alert

结果查看

正常来说 这时候 你的TG 组就有信息了 ,没的话 就停止一个node export  收不到就有问题喽

trouble

起不来 老报错 level=error ts=2019-08-26T05:52:52.19072198Z caller=main.go:337 msg="Loading configuration file failed" file=/usr/local/prometheus/alertmanager/alertmanager.yml err="yaml: unmarshal errors:\n  line 12: field receivers not found in type config.plain"   解决办法 用了聪的办法  - url: 'http://用户:密码@172.24.103.122:9119/alert'

bot see also

https://prometheus.io/docs/alerting/latest/configuration/

https://core.telegram.org/bots

How to Create a Telegram Bot ID/Chat ID

Prometheus Alertmanager Telegram Bot

创建telegram 机器人 并发送消息

如何创建我自己的电报机器人(Telegram Bot)

在 Telegram 中管理主机监控和警报信息

https://github.com/inCaller/prometheus_bot

https://github.com/metalmatze/alertmanager-bot

基于prometheus + grafana + mysql + Telegram 监控告警

https://my.oschina.net/54188zz/blog/3030618

自定义Prometheus告警规则

prometheus alertmanager telegram

部署Alertmanager实现邮件/钉钉/微信报警

prometheus告警插件-alertmanager


This is the Alertmanager bot for Prometheus that notifies you on alerts.


https://github.com/metalmatze/alertmanager-bot


Prometheus 和 Alertmanager实战配置

微信

Prometheus 微信告警注意事项

webhook

prometheus alertmanager webhook 配置教程

prometheus+alertmanager+webhook实现自定义监控报警系统

* intro

告警能力在Prometheus的架构中被划分为两个部分,在Prometheus Server中定义告警规则以及产生告警,Alertmanager组件则用于处理这些由Prometheus产生的告警。Alertmanager即Prometheus体系中告警的统一处理中心。 Alertmanager提供了多种内置第三方告警通知方式,同时还提供了对Webhook通知的支持,通过Webhook用户可以完成对告警更多个性化的扩展。

* ins

** using docker or docker-composer

用自带的 compose

https://hub.docker.com/r/prom/alertmanager/dockerfile

      • docker only
 docker pull prom/alertmanager
 docker run --name alertmanager  -d -p 9093:9093   -v /path/to/config.yml:/etc/alertmanager/conf/config.yml prom/alertmanager



* conf

rules


 vim node-up.rules
groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="node-exporter"} == 0
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过 15s!"

说明一下:该 rules 目的是监测 node 是否存活,expr 为 PromQL 表达式验证特定节点 job="node-exporter" 是否活着,for 表示报警状态为 Pending 后等待 15s 变成 Firing 状态,一旦变成 Firing 状态则将报警发送到 AlertManager,labels 和 annotations 对该 alert 添加更多的标识说明信息,所有添加的标签注解信息,以及 prometheus.yml 中该 job 已添加 label 都会自动添加到邮件内容中,更多关于 rule 详细配置可以参考

#告警解除 

* 自定义告警规则

** CPU load 自定义告警规则

  - alert: high_load-85per
    expr: (100-(avg(irate(node_cpu_seconds_total{mode="idle"}[5m]))by (job)) * 100)  > 80
    #expr: sum(avg without (cpu)(irate(node_cpu{mode!='idle'}[5m]))) by (instance) > 0.81
    #expr: node_load1 > 0.2
    for: 10m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} under high load"
      description: "{{ $labels.instance }} of job {{ $labels.job }} is under high load more than  12 minutes."

 FIRING 才会 send email 

** 内存自定义告警规则

#rules file 注意空格在前面哦 
- alert: hostMemUsageAlert
    expr: ((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90
    #expr: (node_memory_MemTotal - node_memory_MemAvailable)/node_memory_MemTotal > 0.85
    for: 1m
    labels:
      severity: page
    annotations:
      summary: "Instance {{ $labels.instance }} MEM usgae high"
      description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"


成功的自定义报警规则 2020 https://www.shared-code.com/article/84

这个成功的 上面的不成功

((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90

常用prometheus告警规则模板(三


自定义Prometheus告警规则

** 磁盘自定义告警

- alert: LowDiskSpaceNodeFilesystemUsage
    expr: 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
    for: 1m
    labels:
      severity: warning
    annotations:
      summary: "Instance {{ $labels.instance  }} :{{ $labels.mountpoint }} 分区使用率过高" 
      description: "{{ $labels.instance  }} : {{ $labels.job  }} :{{ $labels.mountpoint  }} 这个分区使用大于百分之80% (当前值:{{ $value }})"

see also

新环境可能还要分组一下什么的

Prometheus监控 - Alertmanager报警模块

Prometheus监控告警浅析

Alertmanager 部署配置

Prometheus监控node_exporter的告警规则


Prometheus学习系列(三十九)之报警模板例子

https://prometheus.io/docs/alerting/alertmanager/

Prometheus Alertmanager报警组件

good-prometheus + AlertManager 实现对多node节点CPU和内存信息的监控

prometheus-常用的监控告警规则

Prometheus配合Alertmanager报警系统

Prometheus 和 Alertmanager实战配置

alertmanager报警规则详解


prometheus - 监控磁盘

Prometheus自定义监控部署

自定义Prometheus告警规则

监控指标以及prometheus规则-不断完善中

Prometheus 编写告警规则案例

使用prometheus自定义监控

prometheus通过node_exporter抓取的数据准确计算磁盘使用率

k8s

Prometheus报警AlertManager实战