Logstash+Filebeat收集处理HaProxy访问日志

filebeat 用于日志的收集、过滤、缓存,在大规模日志抓取、分析场景中,可以大幅度提升ELK系统的日志抓取性能和配置简化。

而Logstash通常用作Broker来集中处理各种beats传输过来的日志,根据预设的日志类型做相应的处理,例如增加字段、配置geoip等,并将处理后的数据传输到Elasticsearch,集中的logstash能够极大程度减少配置复杂度,任何的配置更改只需要改这集中的logstash就可以了。

本文将使用这样的模式来配置处理haproxy默认的访问日志,haproxy服务器安装filebeat 用于收集日志并传输到Logstash。

1、配置Logstash

这里我将配置文件分为三部分,input、日志处理配置、ouput,结构清晰,如果需要添加其他的日志处理配置,按编号增加配置文件即可,例如我要增加一个tomcat的访问日志处理配置,那么建立一个11.tomcat_access.conf.

[root@server25 conf.d]# ll
总用量 12
-rw-r--r-- 1 root root  216 9月   2 14:34 100.output.conf
-rw-r--r-- 1 root root 1479 9月   5 16:54 10.log_access.conf
-rw-r--r-- 1 root root   63 9月  26 10:23 1.input.conf
1.input.conf 配置

端口和监听IP

[root@server25 conf.d]# cat 1.input.conf
input {
  beats {
    host => "0.0.0.0"
    port => 5044
  }
}

10.haproxy_access.conf


[root@server25 conf.d]# cat 10.haproxy_access.conf |more filter { if [type] == "haproxy_access" { grok { match => ["message", "%{HAPROXYHTTP}"] ###logstash 默认的haproxy patterns } grok { match => ["message", "%{HAPROXYDATE:accept_date}"] } date { match => ["accept_date", "dd/MMM/yyyy:HH:mm:ss.SSS"] ### 提取日志中accept_date字段日期转存到@timestamp 字段里,但是这个@timestamp 比我们晚8小时,稍后会有详细的引用说明。 } if [host] == "CMHAProxy02" { mutate { add_field => { "SrvName" => "haproxy53" } ###根据主机名称增加一个SrvName的字段,方便识别。 } } geoip { source => "client_ip" ### 根据client_ip 做geoip 方便后续的绘图。 } } }
100.output.conf 配置

输出到标准输出

[root@server25 conf.d]# cat 100.output.conf
output  {
        stdout  {
            codec => rubydebug
            }
}

时间处理(Date)

之前章节已经提过,filters/date 插件可以用来转换你的日志记录中的时间字符串,变成 LogStash::Timestamp 对象,然后转存到 @timestamp 字段里。
注意:因为在稍后的 outputs/elasticsearch 中常用的 %{+YYYY.MM.dd} 这种写法必须读取 @timestamp 数据,所以一定不要直接删掉这个字段保留自己的字段,而是应该用 filters/date 转换后删除自己的字段!
这在导入旧数据的时候固然非常有用,而在实时数据处理的时候同样有效,因为一般情况下数据流程中我们都会有缓冲区,导致最终的实际处理时间跟事件产生时间略有偏差。
小贴士:个人强烈建议打开 Nginx 的 access_log 配置项的 buffer 参数,对极限响应性能有极大提升!

以上引用 http://kibana.logstash.es/content/logstash/plugins/filter/date.html,以后有更好的解决方案会做补充。

2、安装配置filebeat

[root@CMHAProxy02 ~]# wget https://download.elastic.co/beats/filebeat/filebeat-1.2.3-x86_64.rpm
[root@CMHAProxy02 ~]# rpm -ivh ilebeat-1.2.3-x86_64.rpm
filebeat配置,删除了注释部分。
[root@CMHAProxy02 filebeat]# cat filebeat.yml
filebeat:
  spool_size: 1024                                    # 最大可以攒够 1024 条数据一起发送出去
  idle_timeout: "5s"                                  # 否则每 5 秒钟也得发送一次
  registry_file: /var/lib/filebeat/registry                          # 文件读取位置记录文件,会放在当前工作目录下。所以如果你换一个工作目录执行 filebeat 会导致重复传输!
  prospectors:
    -
      paths:
        - /var/log/haproxy.log
      input_type: log
      exclude_lines: ["started","Pausing","Enabling","DOWN","UP","admin_stats","backend"] ###过滤了haproxy日志中的关于服务的启停、管理监控页面访问信息等,只传输正常的访问数据。
      exclude_files: [".gz$"]
      document_type: haproxy_access
output:
   logstash:
    hosts: ["110.20.30.40:5044"]
shipper:
logging:
  files:
    path: /var/log/mybeat
    name: mybeat
    rotateeverybytes: 10485760 # = 10MB   ###filebeat自身日志的滚动策略,每10M滚动一个文件。

Filebeat 发送的日志,会包含以下字段:

  • beat.hostname beat 运行的主机名
  • beat.name shipper 配置段设置的 name,如果没设置,等于 beat.hostname
  • @timestamp 读取到该行内容的时间
  • type 通过 document_type 设定的内容
  • input_type 来自 “log” 还是 “stdin”
  • source 具体的文件名全路径
  • offset 该行日志的起始偏移量
  • message 日志内容
  • fields 添加的其他固定字段都存在这个对象里面

3、启动logstash、filebeat

[root@server25 ~]# logstash -f /etc/logstash/conf.d/ &

[root@CMHAProxy02 ~]# service filebeat start

4、查看Logstash终端显示数据


"message" => "Oct 25 03:25:52 localhost haproxy[25910]: 124.126.199.211:55180 [25/Oct/2016:03:25:52.213] test ocs_static.server/ocs52 60/0/0/2/62 200 5379 - - --NI 2/2/0/0/0 0/0 \"GET /themes/default/css/style.css HTTP/1.1\"", "@version" => "1", "@timestamp" => "2016-10-24T19:25:52.213Z", "source" => "/var/log/haproxy.log", "type" => "haproxy_access", "input_type" => "log", "beat" => { "hostname" => "CMHAProxy02", "name" => "CMHAProxy02" }, "offset" => 330502, "count" => 1, "fields" => nil, "host" => "CMHAProxy02", "tags" => [ [0] "beats_input_codec_plain_applied" ], "syslog_timestamp" => "Oct 25 03:25:52", "syslog_server" => "localhost", "program" => "haproxy", "pid" => "25910", "client_ip" => "124.126.199.211", "client_port" => "55180", "accept_date" => [ [0] "25/Oct/2016:03:25:52.213", [1] "25/Oct/2016:03:25:52.213" ], "haproxy_monthday" => [ [0] "25", [1] "25" ], "haproxy_month" => [ [0] "Oct", [1] "Oct" ], "haproxy_year" => [ [0] "2016", [1] "2016" ], "haproxy_time" => [ [0] "03:25:52", [1] "03:25:52" ], "haproxy_hour" => [ [0] "03", [1] "03" ], "haproxy_minute" => [ [0] "25", [1] "25" ], "haproxy_second" => [ [0] "52", [1] "52" ], "haproxy_milliseconds" => [ [0] "213", [1] "213" ], "frontend_name" => "test", "backend_name" => "ocs_static.server", "server_name" => "ocs52", "time_request" => "60", "time_queue" => "0", "time_backend_connect" => "0", "time_backend_response" => "2", "time_duration" => "62", "http_status_code" => "200", "bytes_read" => "5379", "captured_request_cookie" => "-", "captured_response_cookie" => "-", "termination_state" => "--NI", "actconn" => "2", "feconn" => "2", "beconn" => "0", "srvconn" => "0", "retries" => "0", "srv_queue" => "0", "backend_queue" => "0", "http_verb" => "GET", "http_request" => "/themes/default/css/style.css", "http_version" => "1.1", "SrvName" => "haproxy54", "geoip" => { "ip" => "124.126.199.211", "country_code2" => "CN", "country_code3" => "CHN", "country_name" => "China", "continent_code" => "AS", "region_name" => "22", "city_name" => "Beijing", "latitude" => 39.9289, "longitude" => 116.38830000000002, "timezone" => "Asia/Harbin", "real_region_name" => "Beijing", "location" => [ [0] 116.38830000000002, [1] 39.9289 ] } }

发表评论