Client-Side Aggregations

To minimize complexity and maximize scalability metrics are aggregated and time-aligned on the source.

It uses the nagios service_perfdata_file_processing_command on a 30sec interval (0.02s real/wall time usage).

Details

Monitor ci -> /etc/nagios/conf.d -> nagios interleaves and executes -> /var/log/nagios/perf.log
Nagios service_perfdata_file_processing_command /opt/nagios/libexec/calc_perf_buckets.rb (~200 lines)
…debug log snippet …

flush 1m-avg CpuIdle - 1433451960 - 1433452020
time:val:delta:weight 1433451992:96.39:32:0.53
time:val:delta:weight 1433452052:96.18:28:0.47
aggregate: 96.292
 …

Outputs to service.perflog which logstash transports format:

<epoc> <pretty time> <ci_id>:<ci_name>:<bucket-stat (1m-avg)> <perf blob (key=value space delimited)>