页面“Alertmanager”与“Filebeat nginx log”之间的差异

来自linux中国网wiki
(页面间的差异)
跳到导航 跳到搜索
 
 
第1行: 第1行:
=*  my email =
+
[[category:ops]]
@126.com
 
  
授权密码
+
=ins and config=
=telegram=
+
==Download and install Filebeat==
通过 prome
 
 
 
这个有空看一下
 
https://github.com/metalmatze/alertmanager-bot
 
==*  创建 tg  机器和 报警组==
 
 
 
===** 创建机器 ===
 
 
 
====*** 202011 创建bot的例子====
 
 
<pre>
 
<pre>
#2020
 
evan lai, [29.10.20 16:50]
 
/start
 
 
BotFather, [29.10.20 16:50]
 
I can help you create and manage Telegram bots. If you're new to the Bot API, please see the manual (https://core.telegram.org/bots).
 
 
You can control me by sending these commands:
 
 
/newbot - create a new bot
 
/mybots - edit your bots [beta]
 
 
Edit Bots
 
/setname - change a bot's name
 
/setdescription - change bot description
 
/setabouttext - change bot about info
 
/setuserpic - change bot profile photo
 
/setcommands - change the list of commands
 
/deletebot - delete a bot
 
 
Bot Settings
 
/token - generate authorization token
 
/revoke - revoke bot access token
 
/setinline - toggle inline mode (https://core.telegram.org/bots/inline)
 
/setinlinegeo - toggle inline location requests (https://core.telegram.org/bots/inline#location-based-results)
 
/setinlinefeedback - change inline feedback (https://core.telegram.org/bots/inline#collecting-feedback) settings
 
/setjoingroups - can your bot be added to groups?
 
/setprivacy - toggle privacy mode (https://core.telegram.org/bots#privacy-mode) in groups
 
 
Games
 
/mygames - edit your games (https://core.telegram.org/bots/games) [beta]
 
/newgame - create a new game (https://core.telegram.org/bots/games)
 
/listgames - get a list of your games
 
/editgame - edit a game
 
/deletegame - delete an existing game
 
 
BotFather, [29.10.20 16:50]
 
Alright, a new bot. How are we going to call it? Please choose a name for your bot.
 
  
evan lai, [29.10.20 16:50]
 
/newbot
 
  
evan lai, [29.10.20 16:51]
+
curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.12.0-amd64.deb
evan_alert_bot
+
sudo dpkg -i filebeat-7.12.0-amd64.deb
 
 
BotFather, [29.10.20 16:51]
 
Good. Now let's choose a username for your bot. It must end in `bot`. Like this, for example: TetrisBot or tetris_bot.
 
 
 
evan lai, [29.10.20 16:51]
 
evan_alert_bot
 
 
 
BotFather, [29.10.20 16:51]
 
Done! Congratulations on your new bot. You will find it at t.me/evan_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.
 
 
 
Use this token to access the HTTP API:
 
1363904888:AAGeUIoxxRMlxk9zHUa2MTRi1My9HDBP69w
 
Keep your token secure and store it safely, it can be used by anyone to control your bot.
 
 
 
For a description of the Bot API, see this page: https://core.telegram.org/bots/api
 
 
</pre>
 
</pre>
 
+
==Edit the configuration ==
====有用的信息 ====
 
 
<pre>
 
<pre>
 +
Modify /etc/filebeat/filebeat.yml to set the connection information:
  
 +
output.elasticsearch:
 +
  hosts: ["<es_url>"]
 +
  username: "elastic"
 +
  password: "<password>"
 +
setup.kibana:
 +
  host: "<kibana_url>"
  
evan lai, [10.05.20 21:55]
+
Where <password> is the password of the elastic user, <es_url> is the URL of Elasticsearch, and <kibana_url> is the URL of Kibana.
lxtx_prom_alert_bot
 
 
 
BotFather, [10.05.20 21:55]
 
Done! Congratulations on your new bot. You will find it at t.me/lxtx_prom_alert_bot. You can now add a description, about section and profile picture for your bot, see /help for a list of commands. By the way, when you've finished creating your cool bot, ping our Bot Support if you want a better username for it. Just make sure the bot is fully operational before you do this.
 
 
 
Use this token to access the HTTP API:
 
1157710367:AAFD9YLsjdQ_t7botbVLa4xxWrOc9LVHNYc
 
Keep your token secure and store it safely, it can be used by anyone to control your bot.
 
 
 
For a description of the Bot API, see this page: https://core.telegram.org/bots/api
 
 
 
 
 
使用API/bottoken/API方法getMe获取自己的id
 
 
 
 
 
curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7xxxxxLa4imWrOV9LVHNYc/getMe
 
 
 
 
 
#前面有bot字母
 
sns:~# curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_xxxxotbVLa4imWrOV9LVHNYc/getMe
 
{"ok":true,"result":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot","can_join_groups":true,"can_read_all_group_messages":false,"supports_inline_queries":false}}
 
  
 
</pre>
 
</pre>
  
=== 创建组===
+
==Enable and configure the nginx module ==
<pre>
 
获取群ID
 
  
在Telegram新建group,然后添加成员刚创建的机器人 (prom_alert_bot) ,调用API方法getUPdates获取群ID
 
 
curl https://api.telegram.org/bot1157710367:AAFD9YLsjdQ_t7botbVLa4imWrOV9LVHNYc/getUpdates
 
{"ok":true,"result":[{"update_id":367831744,
 
"message":{"message_id":1,"from":{"id":796717144,"is_bot":false,"first_name":"evan","last_name":"lai","username":"linuxsa"},"chat":{"id":-470646458,"title":"alerm","type":"group","all_members_are_administrators":true},"date":1597202656,"new_chat_participant":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_member":{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"},"new_chat_members":[{"id":1157710367,"is_bot":true,"first_name":"prom_alert_bot","username":"lxtx_prom_alert_bot"}]}}]}
 
</pre>
 
 
==telegram webhook ==
 
=== 1.先把 webhook 跑起来 ===
 
 
<pre>
 
<pre>
 +
sudo filebeat modules enable nginx
  
 +
Modify the settings in the /etc/filebeat/modules.d/nginx.yml file.
  
git clone https://github.com/evan886/alertmanager-webhook-telegram-python.git
 
cd  alertmanager-webhook-telegram-python/docker
 
docker build -t alertmanager-webhook-telegram:1.0 .
 
docker run -d --name telegram-bot \
 
-e "bottoken=1157710367:AxxxxxxQ_t7botbVLa4imWrOV9LVHNYc" \
 
-e "chatid=4706458" \
 
-e "username=evan" \
 
-e "password=evanLxx123" \
 
-p 9119:9119 alertmanager-webhook-telegram:1.0
 
 
</pre>
 
</pre>
  
==== 配置 ====
+
== Start Filebeat==
<pre>
 
cat alertmanager/config.yml
 
 
 
# 定义路由树信息,这个路由可以接收到所有的告警,还可以继续配置路由,比如project: zhidaoAPP(prometheus 告警规则中自定义的lable)发给谁,project: baoxian的发给谁
 
route:
 
  group_by: ['alertname'] # 报警分组依据
 
  group_wait: 10s        # 最初即第一次等待多久时间发送一组警报的通知
 
  group_interval: 60s    # 在发送新警报前的等待时间
 
  repeat_interval: 1h    # 发送重复警报的周期 对于email配置中,此项不可以设置过低,否则将会由于邮件发送太多频繁,被smtp服务器拒绝
 
  receiver: 'telegram-webhook'      # 发送警报的接收者的名称,以下receivers name的名称
 
 
 
# 定义警报接收者信息
 
receivers:
 
  - name: 'telegram-webhook'
 
    webhook_configs:
 
    - url: http://evan:[email protected]:9119/alert
 
 
 
</pre>
 
 
 
=== 结果查看===
 
正常来说 这时候 你的TG 组就有信息了 ,没的话 就停止一个node export  收不到就有问题喽
 
 
 
== trouble==
 
起不来 老报错 level=error ts=2019-08-26T05:52:52.19072198Z caller=main.go:337 msg="Loading configuration file failed" file=/usr/local/prometheus/alertmanager/alertmanager.yml err="yaml: unmarshal errors:\n  line 12: field receivers not found in type config.plain"  解决办法 用了聪的办法  - url: 'http://用户:密码@172.24.103.122:9119/alert'
 
 
 
== bot  see also==
 
https://prometheus.io/docs/alerting/latest/configuration/
 
 
 
https://core.telegram.org/bots
 
 
 
[https://techsoftcenter.com/how-to-create-a-telegram-bot-id-chat-id/ How to Create a Telegram Bot ID/Chat ID]
 
 
 
[https://toolbox.kali-linuxtr.net/prometheus-alertmanager-telegram-bot.tool Prometheus Alertmanager Telegram Bot]
 
 
 
[https://www.cnblogs.com/KillBugMe/p/13140226.html 创建telegram 机器人 并发送消息]
 
 
 
[https://www.teleme.io/articles/create_your_own_telegram_bot?hl=zh-hans 如何创建我自己的电报机器人(Telegram Bot)]
 
 
 
[https://nova.moe/manage-host-alert-on-telegram-with-grafana/ 在 Telegram 中管理主机监控和警报信息]
 
 
 
https://github.com/inCaller/prometheus_bot
 
 
 
https://github.com/metalmatze/alertmanager-bot
 
 
 
[https://blog.csdn.net/weixin_34242331/article/details/91875514  基于prometheus + grafana + mysql + Telegram 监控告警]
 
 
 
https://my.oschina.net/54188zz/blog/3030618
 
 
 
[https://yunlzheng.gitbook.io/prometheus-book/parti-prometheus-ji-chu/alert/prometheus-alert-rule 自定义Prometheus告警规则]
 
 
 
[https://www.linux.org.ru/forum/general/14894302  prometheus alertmanager telegram ]
 
 
 
[https://www.cnblogs.com/wangxu01/articles/11654836.html 部署Alertmanager实现邮件/钉钉/微信报警]
 
 
 
[https://www.cnblogs.com/xiaobaozi-95/p/10740511.html prometheus告警插件-alertmanager]
 
 
 
 
 
 
 
[https://github.com/metalmatze/alertmanager-bot This is the Alertmanager bot for Prometheus that notifies you on alerts.]
 
 
 
 
 
 
 
https://github.com/metalmatze/alertmanager-bot
 
 
 
 
 
[https://www.cnblogs.com/longcnblogs/p/9620733.html  Prometheus 和 Alertmanager实战配置]
 
 
 
== 微信==
 
 
 
[https://blog.csdn.net/knight_zhou/article/details/106937276  Prometheus 微信告警注意事项]
 
=webhook=
 
 
 
 
 
[https://blog.csdn.net/shida_csdn/article/details/81980021  prometheus alertmanager webhook 配置教程]
 
  
[https://blog.csdn.net/bluuusea/article/details/104619235  prometheus+alertmanager+webhook实现自定义监控报警系统]
 
 
=* intro =
 
告警能力在Prometheus的架构中被划分为两个部分,在Prometheus Server中定义告警规则以及产生告警,Alertmanager组件则用于处理这些由Prometheus产生的告警。Alertmanager即Prometheus体系中告警的统一处理中心。
 
Alertmanager提供了多种内置第三方告警通知方式,同时还提供了对Webhook通知的支持,通过Webhook用户可以完成对告警更多个性化的扩展。
 
 
=* ins=
 
==** using docker or docker-composer==
 
 
用自带的 compose
 
 
https://hub.docker.com/r/prom/alertmanager/dockerfile
 
*** docker only
 
  docker pull prom/alertmanager
 
  docker run --name alertmanager  -d -p 9093:9093  -v /path/to/config.yml:/etc/alertmanager/conf/config.yml prom/alertmanager
 
 
 
 
 
=* conf =
 
 
<pre>
 
<pre>
rules
+
The setup command loads the Kibana dashboards. If the dashboards are already set up, omit this command.
 
 
 
 
vim node-up.rules
 
groups:
 
- name: node-up
 
  rules:
 
  - alert: node-up
 
    expr: up{job="node-exporter"} == 0
 
    for: 15s
 
    labels:
 
      severity: 1
 
      team: node
 
    annotations:
 
      summary: "{{ $labels.instance }} 已停止运行超过 15s!"
 
 
 
说明一下:该 rules 目的是监测 node 是否存活,expr 为 PromQL 表达式验证特定节点 job="node-exporter" 是否活着,for 表示报警状态为 Pending 后等待 15s 变成 Firing 状态,一旦变成 Firing 状态则将报警发送到 AlertManager,labels 和 annotations 对该 alert 添加更多的标识说明信息,所有添加的标签注解信息,以及 prometheus.yml 中该 job 已添加 label 都会自动添加到邮件内容中,更多关于 rule 详细配置可以参考
 
 
 
#告警解除
 
  
 +
sudo filebeat setup
 +
sudo service filebeat start
 
</pre>
 
</pre>
  
=* 自定义告警规则=
+
== Module status==
==** CPU load 自定义告警规则==
 
 
<pre>
 
<pre>
  - alert: high_load-85per
+
Module status 右边的 check data 按键 -- > Nginx logs dashboard
    expr: (100-(avg(irate(node_cpu_seconds_total{mode="idle"}[5m]))by (job)) * 100) > 80
 
    #expr: sum(avg without (cpu)(irate(node_cpu{mode!='idle'}[5m]))) by (instance) > 0.81
 
    #expr: node_load1 > 0.2
 
    for: 10m
 
    labels:
 
      severity: page
 
    annotations:
 
      summary: "Instance {{ $labels.instance }} under high load"
 
      description: "{{ $labels.instance }} of job {{ $labels.job }} is under high load more than 12 minutes."
 
 
 
FIRING 才会 send email
 
</pre>
 
==** 内存自定义告警规则==
 
<pre>#rules file 注意空格在前面哦
 
- alert: hostMemUsageAlert
 
    expr: ((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90
 
    #expr: (node_memory_MemTotal - node_memory_MemAvailable)/node_memory_MemTotal > 0.85
 
    for: 1m
 
    labels:
 
      severity: page
 
    annotations:
 
      summary: "Instance {{ $labels.instance }} MEM usgae high"
 
      description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"
 
 
 
  
 
</pre>
 
</pre>
成功的自定义报警规则 2020
+
systemctl daemon-reload
https://www.shared-code.com/article/84
 
 
 
这个成功的 上面的不成功
 
 
 
((node_memory_MemTotal_bytes -(node_memory_MemFree_bytes+node_memory_Buffers_bytes+node_memory_Cached_bytes) )/node_memory_MemTotal_bytes ) * 100 > 90
 
 
 
[https://www.shared-code.com/article/84  常用prometheus告警规则模板(三]
 
 
 
 
 
[https://www.bookstack.cn/read/prometheus-book/alert-prometheus-alert-rule.md 自定义Prometheus告警规则]
 
 
 
==** 磁盘自定义告警==
 
<pre>
 
- alert: LowDiskSpaceNodeFilesystemUsage
 
    expr: 100 - (node_filesystem_free_bytes{mountpoint="/",fstype=~"ext4|xfs"} / node_filesystem_size_bytes{fstype=~"ext4|xfs"} * 100) > 80
 
    for: 1m
 
    labels:
 
      severity: warning
 
    annotations:
 
      summary: "Instance {{ $labels.instance  }} :{{ $labels.mountpoint }} 分区使用率过高"
 
      description: "{{ $labels.instance  }} : {{ $labels.job  }} :{{ $labels.mountpoint  }} 这个分区使用大于百分之80% (当前值:{{ $value }})"
 
</pre>
 
  
 
=see also=
 
=see also=
新环境可能还要分组一下什么的
 
 
[https://blog.csdn.net/y_xiao_/article/details/50818451?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.add_param_isCf&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-1.add_param_isCf  Prometheus监控 - Alertmanager报警模块]
 
 
[https://my.oschina.net/OutOfMemory/blog/4706596 Prometheus监控告警浅析]
 
 
[https://www.cnblogs.com/winstom/p/11940570.html Alertmanager 部署配置]
 
 
[https://blog.51cto.com/lookingdream/2504572 Prometheus监控node_exporter的告警规则]
 
 
 
[https://juejin.im/post/6844903880778579976 Prometheus学习系列(三十九)之报警模板例子 ]
 
 
https://prometheus.io/docs/alerting/alertmanager/
 
 
[https://www.jianshu.com/p/239b145e2acc Prometheus Alertmanager报警组件]
 
 
[https://blog.csdn.net/qq_25178661/article/details/86690729 good-prometheus + AlertManager 实现对多node节点CPU和内存信息的监控]
 
 
[https://blog.csdn.net/kozazyh/article/details/80636512  prometheus-常用的监控告警规则]
 
 
[https://blog.51cto.com/jerrymin/2333824  Prometheus配合Alertmanager报警系统]
 
 
[https://www.cnblogs.com/longcnblogs/p/9620733.html Prometheus 和 Alertmanager实战配置]
 
 
[https://www.kancloud.cn/huyipow/prometheus/527563 alertmanager报警规则详解]
 
 
 
[https://blog.csdn.net/wang725/article/details/94174331  prometheus - 监控磁盘]
 
 
[https://blog.csdn.net/mnasd/article/details/86694412  Prometheus自定义监控部署]
 
 
[https://www.ctolib.com/docs/sfile/prometheus-book/alert/prometheus-alert-rule.html 自定义Prometheus告警规则]
 
 
[https://blog.csdn.net/weixin_33827731/article/details/92947113?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase  监控指标以及prometheus规则-不断完善中]
 
 
[https://www.cnblogs.com/xiangsikai/p/11290000.html Prometheus 编写告警规则案例]
 
 
[https://www.jianshu.com/p/1f05476ebcee 使用prometheus自定义监控]
 
  
[https://blog.csdn.net/chubi7812/article/details/100612951?utm_medium=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-BlogCommendFromMachineLearnPai2-1.nonecase  prometheus通过node_exporter抓取的数据准确计算磁盘使用率]
+
https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-module-nginx.html
  
=k8s =
 
  
[https://www.qikqiak.com/post/alertmanager-of-prometheus-in-practice/ Prometheus报警AlertManager实战]
+
[https://www.cnblogs.com/kuku0223/p/8317965.html ELK--filebeat nginx模块]
[[category:ops]] [[category:container]] [[category:prom]]
 

2021年4月27日 (二) 09:25的版本


ins and config

Download and install Filebeat



curl -L -O https://artifacts.elastic.co/downloads/beats/filebeat/filebeat-7.12.0-amd64.deb
sudo dpkg -i filebeat-7.12.0-amd64.deb

Edit the configuration

Modify /etc/filebeat/filebeat.yml to set the connection information:

output.elasticsearch:
  hosts: ["<es_url>"]
  username: "elastic"
  password: "<password>"
setup.kibana:
  host: "<kibana_url>"

Where <password> is the password of the elastic user, <es_url> is the URL of Elasticsearch, and <kibana_url> is the URL of Kibana.

Enable and configure the nginx module

sudo filebeat modules enable nginx

Modify the settings in the /etc/filebeat/modules.d/nginx.yml file.

Start Filebeat

The setup command loads the Kibana dashboards. If the dashboards are already set up, omit this command.

sudo filebeat setup
sudo service filebeat start

Module status

Module status  右边的 check data 按键  -- > Nginx logs dashboard

systemctl daemon-reload

see also

https://www.elastic.co/guide/en/beats/filebeat/current/filebeat-module-nginx.html


ELK--filebeat nginx模块