Edit this page on GitHub

Home > developer > content development > Default Monitor Thresholds

Default Monitor Thresholds

The tables below list all of the default monitor thresholds implicitly added in all environments. As an app owner, you should review and update these thresholds to what is best suited for your app.

Monitor Type Resource Name Threshold Definition Description Action
CPU Load Heartbeat compute   If collection for any of the load metrics (load1, load5 or load15) is missed, raises a missing heartbeat pulse event which makes the compute instance unhealthy. Unhealthy notification is raised. Repair action is executed on the affected instance.
CPU Load compute 'HighLoad' => threshold('1m','avg','load5',trigger('>=',30,3,1),reset('<',15,1,1)) Compute is heavily loaded if the load5 average value goes above 30. Then set the trigger. Notify only. No action.
CPU Usage compute 'HighCpuUsage' =>threshold('5m','avg','CpuIdle',trigger('<=',10,15,2),reset('>',15,15,1)) Compute utilization is very high if cpuidle goes below 10% which means that more than 90% is utilized. Notify only. No action.
Socket Connection compute   No default threshold is defined. Monitor can be set up with different State: TIME_OUT, ESTABLISHED, CLOSE_WAIT, etc.  
Network compute   No default threshold is defined.  
Filesystem root volume / 'LowDiskSpace' => threshold('1m', 'avg', 'space_used', trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)) Compute has low disk space when space_used is more than 90% at root disk. /'LowDiskInode' => threshold('1m', 'avg', 'inode_used', trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)) Compute has low inode when inode_used is more than 90% at root disk / Notify only. No action.
System messages file /var/log/messages    
Memory Compute 'HighMemUse' => threshold('1m', 'avg', 'free', trigger('<', 50000, 5, 4), reset('>', 80000, 5, 4)) Compute is using too much memory when available (free) memory goes lower than 50MB. Notify only. No action.
Process cron crond process 'CrondProcessLow' => threshold('1m', 'avg', 'count', trigger('<', 1, 1, 1), reset('>=', 1, 1, 1)) crond process should be running. If not, the process count goes below 1 and raises the alert. 'CrondProcessHigh' => threshold('1m', 'avg', 'count', trigger('>=', 200, 1, 1), reset('<', 200, 1, 1)) crond process count should not be above 200. If found, raises the alert. Notify only. No action.
Process sendmail postfix process 'PostfixProcessLow' => threshold('1m', 'avg', 'count', trigger('<', 1, 1, 1), reset('>=', 1, 1, 1)) postfix process should be running. If not, the process count goes below 1 and raises the alert. 'PostfixProcessHigh' => threshold('1m', 'avg', 'count', trigger('>=', 200, 1, 1), reset('<', 200, 1, 1)) postfix process count should not be above 200. If found, raised the alert. Notify only. No action.
Process SSH Daemon sshd process 'SshdProcessLow' => threshold('1m', 'avg', 'count', trigger('<', 1, 1, 1), reset('>=', 1, 1, 1)) sshd process should be running. If not, the process count goes below 1 and raises the alert. 'SshdProcessHigh' => threshold('1m', 'avg', 'count', trigger('>=', 200, 1, 1), reset('<', 200, 1, 1)) sshd process count should not be above 200. If found raises the alert. Notify only. No action.

Volume /app Thresholds

Monitor Type Resource Name Threshold Definition Description Action
Filesystem /app volume 'LowDiskSpaceCritical' => threshold('1m', 'avg', 'space_used', trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)) Volume has low disk space when space_used is more than 90% at root disk /app 'LowDiskInodeCritical' => threshold('1m', 'avg', 'inode_used',trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)), Volume has low inode space when inode_used is more than 90% at root disk /app Notify only. No action.

Tomcat Thresholds

Monitor Type Resource Name Threshold Definition Description Action
Tomcat process tomcat-daemon 'TomcatDaemonProcessDown' => threshold('1m', 'avg', 'up', trigger('<=', 98, 1, 1), reset('>', 95, 1, 1)) tomcat daemon process is considered down if its process availability goes below 90%. Even though the threshold says below 90%, in reality the process no longer exists. Do not change the average values to 100%. Notify only. No action.
JvmInfo tomcat 'HighMemUse' => threshold('1m','avg', 'percentUsed',trigger('>=',90,5,1),reset('<',85,5,1)) Note: Values are calculated from http://localhost:#{port}/manager/status?XML=true  
ThreadInfo tomcat 'HighThreadUse' => threshold('5m','avg','percentBusy',trigger('>=',90,5,1),reset('<',85,5,1)) Note: Values are calculated from http://localhost:#{port}/manager/status?XML=true  
RequestInfo tomcat   No Threshold defined. Note: Values are calculated from http://localhost:#{port}/manager/status?XML=true  
Log tomcat 'CriticalLogException' => threshold('15m', 'avg', 'logtomcat_criticals', trigger('>=', 1, 15, 1), reset('<', 1, 15, 1))  
AppVersion tomcat      

Artifact App – App-Specific Thresholds

Monitor Type Resource Name Threshold Definition Description Action
Exception Monitoring artifact Level * Log Path: * /log/logmon/logmon.log * Pattern to look for: Exception * thresholds: 1 (Alert on every occurrence ) * Severity: Major * If more than 2 Critical 'CriticalLogException' => threshold('15m', 'avg', 'logtomcat_criticals', trigger('>=', 1, 15, 1), reset('<', 1, 15, 1)), 'logfile' => '/log/apache-tomcat/catalina.out', 'warningpattern' => 'WARNING', 'criticalpattern' => 'CRITICAL' The three parameters above define the file to be monitored for warning and critical patterns. Notify only. No action.

Apache Server Thresholds

Monitor Type Resource Name Threshold Definition Description Action
ServerStatus Apache 'TooBusy' => threshold('5m','avg','idle_workers',trigger('<',5,5,5),reset('>',5,5,5)), 'HighUserCpu' => threshold('5m','avg','cpu_user',trigger('>',60,5,1),reset('<',60,5,1)), 'HighSysCpu' => threshold('5m','avg','cpu_sys',trigger('>',30,5,1),reset('<',30,5,1)) Note: All the metrics are calculated using http://localhost:#{port}/server-status Notify only. No action.

ActiveMQ Thresholds

Monitor Type Resource Name Threshold Definition Description Action
BrokerStatus activemq Note: Metrics values are calculated using queues: <protocol>://<host>:<port>/admin/xml/queues.jsp topics: <protocol>://<host>:<port>/admin/xml/topics.jsp  
Log activemq 'CriticalLogException' => threshold('15m', 'avg', 'logtomcat_criticals', trigger('>=', 1, 15, 1), reset('<', 1, 15, 1)), 'logfile' => '/opt/apache-activemq-5.5.1/data/wrapper.log', 'warningpattern' => 'OutOfMemory', 'criticalpattern' => 'OutOfMemory' The three parameters above define the file to be monitored for warning and critical patterns. Log Path: /log/logmon/logmon.log Pattern to look for: Exception. Notify only. No action.
Memory activemq No threshold defined 'protocol' => 'http', 'port' => '8161', 'path' => '/admin/index.jsp?printable=true' Note: Metrics values are calculated using <protocol>://<host>:<port>/admin/index.jsp?printable=true Notify only. No action.
Process Daemon 'ActiveMQDaemonProcessDown' => threshold('1m', 'avg', 'up', trigger('<=', 98, 1, 1), reset('>', 95, 1, 1)) Notify only. No action.