Maxkit: statsd

statsd 是 Graphite/Carbon metrics server 的 front-end proxy，最初由 Etsy's Erik Kastner 以 Node.js 撰寫，目前已經有多種程式語言的實作版本。他是一個 event counter/aggregation service，接收 event timeings，做基本計算後，就產生 values，這可用來收集 custom application metrics，而 application 只需要不斷地發送 events。

collectd 在 5.4 版後就支援了 statsd plugin，也就是將 statsd 嵌入了 collectd。

statsd 是一個 UDP (也可換成 TCP) daemon，根據簡單的協議收集statsd客戶端發送來的數據，聚合統計之後，再定時推送給後端，如graphite和influxdb等，然後透過grafana顯示資料。

系統分成三個部分: client, server, backend。client 要植入 application 中，將相應的 metrics 發送給 statsd server。statsd server 聚合這些 metrics 後，定時發送給 backends。backends 負責儲存這些 Time Series Data，再透過適當的圖表工具展示資料。

安裝

要先安裝 nodejs，由 EPEL 安裝的是 nodejs 6.11.3-1.el7 版

yum install -y epel-release
yum install -y nodejs

如果要改安裝 nodejs 7，必須改用下面的程序

# Install Node.js 7.x repository
curl -sL https://rpm.nodesource.com/setup_7.x | bash -

# Install Node.js and npm
yum install nodejs

直接由 statsd github clone 並安裝 statsd

cd /usr/local/src

git clone https://github.com/etsy/statsd.git

cd statsd

npm install

設定

首先複製一份設定檔

cp exampleConfig.js config.js

修改 graphite 的設定

vi config.js

{
  graphitePort: 2003, 
  graphiteHost: "localhost",
  port: 8125,
  backends: [ "./backends/graphite" ]
}

修改 graphite 的設定

vi /opt/graphite/conf/storage-schemas.conf

[carbon]
pattern = ^carbon\.
retentions = 60:90d

[stats]
pattern = ^stats.*
retentions = 10s:6h,10m:7d,1d:5y

[stats_counts]
pattern = ^stats_counts.*
retentions = 10s:6h,10m:7d,1d:5y

[collectd]
pattern = ^collectd.*
retentions = 10s:6h,10m:7d,1d:5y

[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

10s:6h,10m:7d,1d:5y

6 hours of 10 seconds data
7 days of 10 mins data
5 years of 1 day data

如果 retentions 時間設定為這樣，資料會更多一些

[carbon]
pattern = ^carbon\.
retentions = 60:90d

[stats]
pattern = ^stats.*
retentions = 10s:1d,30s:7d,1m:30d,15m:5y

[stats_counts]
pattern = ^stats_counts.*
retentions = 10s:1d,30s:7d,1m:30d,15m:5y

[collectd]
pattern = ^collectd.*
retentions = 10s:1d,30s:7d,1m:30d,15m:5y

[default_1min_for_1day]
pattern = .*
retentions = 60s:1d

10s:1d,30s:7d,1m:30d,15m:5y

1 day of 10 seconds data
7 days of 30 seconds data
30 days of 1 minute data
5 years of 15 minutes data

必須要同時修改 /opt/graphite/storage/whisper 路徑的 *.wsp 資料，可參考Whisper Scripts 文件。

# 修改 wsp size
find /opt/graphite/storage/whisper/collectd -type f -name '*.wsp' -exec whisper-resize.py --nobackup {} 10s:6h 10m:7d 1d:5y \;

# 列印 wsp file size
find /opt/graphite/storage/whisper/collectd -type f -name '*.wsp' -exec whisper-info.py {} \;

vim /opt/graphite/conf/storage-aggregation.conf

[lower]
pattern = \.lower$
xFilesFactor = 0.1
aggregationMethod = min

[min]
pattern = \.min$
xFilesFactor = 0.1
aggregationMethod = min

[upper]
pattern = \.upper(_\d+)?$
xFilesFactor = 0.1
aggregationMethod = max

[max]
pattern = \.max$
xFilesFactor = 0.1
aggregationMethod = max

[sum]
pattern = \.sum$
xFilesFactor = 0
aggregationMethod = sum

[gauges]
pattern = ^.*\.gauges\..*
xFilesFactor = 0
aggregationMethod = last

[count]
pattern = \.count$
xFilesFactor = 0
aggregationMethod = sum

[count_legacy]
pattern = ^stats_counts.*
xFilesFactor = 0
aggregationMethod = sum

[default_average]
pattern = .*
xFilesFactor = 0.3
aggregationMethod = average

以 .lower .min 或 .upper .max 結尾的 metrics，只會儲存 max, min values，如果少於 10% datapoints，就只會儲存 None
以 count 或 sum 結尾的 metrics，還有在 'stats_counts' 下面的 metrics，會加總所有 values，如果沒有收到資料，會儲存 None
其他資料庫，會計算平均值，如果少於 30% 的 datapoint，就會儲存 None

重新啟動 graphite

systemctl restart carbon
systemctl restart graphite

啟動

有三種方式

直接在 console 啟動

cd /usr/local/src/statsd
node ./stats.js ./config.js

以 system service 方式啟動

vi /usr/lib/systemd/system/statsd.service

[Unit]
Description=statsd daemon

[Service]
ExecStart=/usr/bin/node /usr/local/src/statsd/stats.js /usr/local/src/statsd/config.js
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process

[Install]
WantedBy=multi-user.target

啟動服務

systemctl daemon-reload
systemctl enable statsd
systemctl start statsd

透過 npm forever-service 安裝服務

cd /usr/local/src/statsd
sudo npm install -g forever
sudo npm install -g forever-service
sudo forever-service install statsd -s stats.js -o " config.js"
sudo service statsd start

statsd 會在 UDP:8125 運作，可檢查

netstat -nap | grep 8125

graphite 中會看到這些 metrics

stats.gauges.statsd.timestamp_lag

stats.statsd.graphiteStats.calculationtime
stats.statsd.graphiteStats.flush_length
stats.statsd.graphiteStats.flush_time
stats.statsd.graphiteStats.last_exception
stats.statsd.graphiteStats.last_flush

stats.statsd.bad_line_seen
stats.statsd.metrics_received
stats.statsd.packets_received
stats.statsd.processing_time

stats_counts.statsd.bad_line_seen
stats_counts.statsd.metrics_received
stats_counts.statsd.packets_received

statsd.numStats

Key Concepts

buckets 每一個 stat 都有自己的 bucket，不需要預先定義，最後將會被轉換到 graphite，periods ( . ) 會被換成 folders
values 每個 stat 都有自己的 value，解譯方式由 modifier 決定，values 一般都是 integer
flush 在 flush interval timeout (config.flushInterval 定義，預設值為 10 秒)後，stats 會被 aggregted 並發送到一個 backend service

使用

stats 是使用最基本的 line protocol

<metricname>:<value> | <type>

可用 nc 測試

echo "foo:1|c" | nc -u 127.0.0.1 8125

graphite 會增加這些 metrics

stats.foo
stats_counts.foo

Metric Types

Counting

foo:1|c

把 foo 加 1，flush 後，count 會發送到後端，並 reset 為 0。

如果設定了 config.deleteCounters，在 flush 時，如果 count 是 0，就不會發送 metric 到後端

如果你使用 flush interval（10秒），並在每個間隔通過某個計數器給 statsd 傳送7次 counting。則計時器的 value (stats_counts.foo) 為 7，而 per-second value (stats.foo) 為 0.7，另外 numStats 為 7。

Sampling

foo:1|c|@0.1

最後面 @0.1，表示每 1/10 的時間間隔，都會發送一次 counter

Timing

用來記錄某個 operation 消耗多少時間

foo:320|ms

foo 要花 320ms 完成

statsd 會自動計算該 flush interval 內的 percetiles, average(mean), 標準差, sum, 上下界

在 flush interval 內，你將下列計數器 values 傳給 statsd

會計算下面的 values，並傳送給 graphite

mean_90 496
upper_90 844
sum_90 3472
upper 994
lower 120
count 8
sum 4466
mean 558.25

Gauges

一個被記錄的任意數值

gaugor:333|g

如果 flush 時，值沒有改變，就會再發送一次。設定 config.deleteGauges，就不會再發送一次。

在數值前面加上 + 或 -，是值的計算，而不是覆寫，這表示不能將 gauge 設定為負整數

gaugor:333|g
gaugor:-10|g
gaugor:+4|g

gaugor 結果為 333 - 10 + 4 = 327

Sets

在 flushes 之間，記錄發生的 events，但不重複，可用來記錄某個事件在時間區段中，有哪些使用者曾經使用過

request:1|s  // 1
request:2|s  // 1 2
request:1|s  // 1 2

Multi-Metric Packets

可以在一行 packet 中，以 \n 區隔多個欄位的資料。但要注意網路單一 packet 的傳輸長度上限，例如 Fast Ethernet 為 1432 (包含)。

gorets:1|c\nglork:320|ms\ngaugor:333|g\nuniques:765|s

將 statsd 整合到 collectd

雖然會減少一個 daemon，改用 collectd 同時啟動 statsd，但目前不採用這種安裝方式

修改 /etc/collectd.conf

LoadPlugin statsd

<Plugin statsd>
  Host "0.0.0.0"
  Port "8125"
#  DeleteCounters true
#  DeleteTimers   false
#  DeleteGauges   false
  DeleteSets     true
  CounterSum     true
  TimerPercentile 90.0
#  TimerPercentile 95.0
#  TimerPercentile 99.0
  TimerLower     true
#  TimerUpper     false
#  TimerSum       false
#  TimerCount     false
</Plugin>

restart collectd

systemctl restart collectd

statsd 會在 UDP:8125 運作，可用 netstat 檢查，但卻是由 collectd process 處理的

netstat -nap | grep 8125

如果用 nc 測試時

echo "foo:1|c" | nc -u 127.0.0.1 8125

會在 graphite 發現，metrics 是在 collectd 下面

collectd.testserver.statsd.count-foo
collectd.testserver.statsd.derive-foo

clients

StatsD Example Clients 這裡有多種程式語言的獨立的測試 Client

3rd Party Client Implementations 這裡有第三方 StatsD 的 Library

以 node-statsd 為例。

安裝 node-statsd libray

npm install -g node-statsd

撰寫測試程式，發送 api 回應時間，到 statsd 的 timeing

vi test.js

'use strict';

const StatsD = require('node-statsd'),
client = new StatsD({
  host: 'localhost',
  port: 8125
});

setInterval(function () {
  const responseTime = Math.floor(Math.random() * 100);
  client.timing('api', responseTime, function (error, bytes) {
    if (error) {
      console.error(error);
    } else {
      console.log(`Successfully sent ${bytes} bytes, responseTime: ${responseTime}`);
    }
  });
}, 1000);

執行測試程式

export NODE_PATH=/usr/lib/node_modules
node test.js

在 graphite 中可以取得 stats.timers.api.* 這些 metrics

References

StatsD wiki

statsd學習小結

StatsD！次世代系統監控的核心

使用 Statsd + Graphite 的 Monitoring 心得

聊聊 Statsd 和 Collectd 那點事！

StatsD vs collectd vs fluentd and Other Daemons You Should Know 2016/8

How do StatsD and CollectD relate?

StatsD embedded into CollectD

如何深入理解 StatsD 與 Graphite

使用 StatsD + Grafana + InfluxDB 搭建 Node.js 監控系統

How to install Node.js 7.x on Ubuntu/Debian and CentOS

2017/12/04

statsd

安裝

設定

啟動

Key Concepts

使用

Metric Types

將 statsd 整合到 collectd

clients

References

沒有留言:

張貼留言