Home > developer > content development > Default Monitor Thresholds

Default Monitor Thresholds

The tables below list all of the default monitor thresholds implicitly added in all environments. As an app owner, you should review and update these thresholds to what is best suited for your app.

Monitor Type Resource Name Threshold Definition Description Action
CPU Load Heartbeat compute   If collection for any of the load metrics (load1, load5 or load15) is missed, raises a missing heartbeat pulse event which makes the compute instance unhealthy. Unhealthy notification is raised. Repair action is executed on the affected instance.
CPU Load compute Threshold CPU load 'HighLoad' => threshold('1m','avg','load5',trigger('>=',30,3,1),reset('<',15,1,1)) Compute is heavily loaded if the load5 average value goes above 30. Then set the trigger. Notify only. No action.
CPU Usage compute Threshold CPU usage 'HighCpuUsage' =>threshold('5m','avg','CpuIdle',trigger('<=',10,15,2),reset('>',15,15,1)) Compute utilization is very high if cpuidle goes below 10% which means that more than 90% is utilized. Notify only. No action.
Socket Connection compute   No default threshold is defined. Monitor can be set up with different State: TIME_OUT, ESTABLISHED, CLOSE_WAIT, etc.  
Network compute   No default threshold is defined.  
Filesystem root volume / Low disk inode Low disk space 'LowDiskSpace' => threshold('1m', 'avg', 'space_used', trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)) Compute has low disk space when space_used is more than 90% at root disk. /'LowDiskInode' => threshold('1m', 'avg', 'inode_used', trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)) Compute has low inode when inode_used is more than 90% at root disk / Notify only. No action.
System messages file /var/log/messages critical link offline Critical disk not responding Critical SCSI log exception Critical corrupt label    
Memory Compute High mem use 'HighMemUse' => threshold('1m', 'avg', 'free', trigger('<', 50000, 5, 4), reset('>', 80000, 5, 4)) Compute is using too much memory when available (free) memory goes lower than 50MB. Notify only. No action.
Process cron crond process Crond process high Crond process low 'CrondProcessLow' => threshold('1m', 'avg', 'count', trigger('<', 1, 1, 1), reset('>=', 1, 1, 1)) crond process should be running. If not, the process count goes below 1 and raises the alert. 'CrondProcessHigh' => threshold('1m', 'avg', 'count', trigger('>=', 200, 1, 1), reset('<', 200, 1, 1)) crond process count should not be above 200. If found, raises the alert. Notify only. No action.
Process sendmail postfix process Postfix process high Postfix process low 'PostfixProcessLow' => threshold('1m', 'avg', 'count', trigger('<', 1, 1, 1), reset('>=', 1, 1, 1)) postfix process should be running. If not, the process count goes below 1 and raises the alert. 'PostfixProcessHigh' => threshold('1m', 'avg', 'count', trigger('>=', 200, 1, 1), reset('<', 200, 1, 1)) postfix process count should not be above 200. If found, raised the alert. Notify only. No action.
Process SSH Daemon sshd process SSHD process high SSHD process low 'SshdProcessLow' => threshold('1m', 'avg', 'count', trigger('<', 1, 1, 1), reset('>=', 1, 1, 1)) sshd process should be running. If not, the process count goes below 1 and raises the alert. 'SshdProcessHigh' => threshold('1m', 'avg', 'count', trigger('>=', 200, 1, 1), reset('<', 200, 1, 1)) sshd process count should not be above 200. If found raises the alert. Notify only. No action.

Volume /app Thresholds

Monitor Type Resource Name Threshold Definition Description Action
Filesystem /app volume Low disk inode critical Low disk space critical 'LowDiskSpaceCritical' => threshold('1m', 'avg', 'space_used', trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)) Volume has low disk space when space_used is more than 90% at root disk /app 'LowDiskInodeCritical' => threshold('1m', 'avg', 'inode_used',trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)), Volume has low inode space when inode_used is more than 90% at root disk /app Notify only. No action.

Tomcat Thresholds

Monitor Type Resource Name Threshold Definition Description Action
Tomcat process tomcat-daemon Tomcat daemon process down 'TomcatDaemonProcessDown' => threshold('1m', 'avg', 'up', trigger('<=', 98, 1, 1), reset('>', 95, 1, 1)) tomcat daemon process is considered down if its process availability goes below 90%. Even though the threshold says below 90%, in reality the process no longer exists. Do not change the average values to 100%. Notify only. No action.
JvmInfo tomcat High mem use 'HighMemUse' => threshold('1m','avg', 'percentUsed',trigger('>=',90,5,1),reset('<',85,5,1)) Note: Values are calculated from http://localhost:#{port}/manager/status?XML=true  
ThreadInfo tomcat High thread use 'HighThreadUse' => threshold('5m','avg','percentBusy',trigger('>=',90,5,1),reset('<',85,5,1)) Note: Values are calculated from http://localhost:#{port}/manager/status?XML=true  
RequestInfo tomcat   No Threshold defined. Note: Values are calculated from http://localhost:#{port}/manager/status?XML=true  
Log tomcat Critical log exception 'CriticalLogException' => threshold('15m', 'avg', 'logtomcat_criticals', trigger('>=', 1, 15, 1), reset('<', 1, 15, 1))  
AppVersion tomcat      

Artifact App – App-Specific Thresholds

Monitor Type Resource Name Threshold Definition Description Action
Exception Monitoring artifact Level * Log Path: * /log/logmon/logmon.log * Pattern to look for: Exception * thresholds: 1 (Alert on every occurrence ) * Severity: Major * If more than 2 Critical 'CriticalLogException' => threshold('15m', 'avg', 'logtomcat_criticals', trigger('>=', 1, 15, 1), reset('<', 1, 15, 1)), 'logfile' => '/log/apache-tomcat/catalina.out', 'warningpattern' => 'WARNING', 'criticalpattern' => 'CRITICAL' The three parameters above define the file to be monitored for warning and critical patterns. Notify only. No action.

Apache Server Thresholds

Monitor Type Resource Name Threshold Definition Description Action
ServerStatus Apache High sys cpu High user cpu 'TooBusy' => threshold('5m','avg','idle_workers',trigger('<',5,5,5),reset('>',5,5,5)), 'HighUserCpu' => threshold('5m','avg','cpu_user',trigger('>',60,5,1),reset('<',60,5,1)), 'HighSysCpu' => threshold('5m','avg','cpu_sys',trigger('>',30,5,1),reset('<',30,5,1)) Note: All the metrics are calculated using http://localhost:#{port}/server-status Notify only. No action.

ActiveMQ Thresholds

Monitor Type Resource Name Threshold Definition Description Action
BrokerStatus activemq High backlog Note: Metrics values are calculated using queues: <protocol>://<host>:<port>/admin/xml/queues.jsp topics: <protocol>://<host>:<port>/admin/xml/topics.jsp  
Log activemq Critical exceptions 'CriticalLogException' => threshold('15m', 'avg', 'logtomcat_criticals', trigger('>=', 1, 15, 1), reset('<', 1, 15, 1)), 'logfile' => '/opt/apache-activemq-5.5.1/data/wrapper.log', 'warningpattern' => 'OutOfMemory', 'criticalpattern' => 'OutOfMemory' The three parameters above define the file to be monitored for warning and critical patterns. Log Path: /log/logmon/logmon.log Pattern to look for: Exception. Notify only. No action.
Memory activemq No threshold defined 'protocol' => 'http', 'port' => '8161', 'path' => '/admin/index.jsp?printable=true' Note: Metrics values are calculated using <protocol>://<host>:<port>/admin/index.jsp?printable=true Notify only. No action.
Process Daemon Active mq daemon process down 'ActiveMQDaemonProcessDown' => threshold('1m', 'avg', 'up', trigger('<=', 98, 1, 1), reset('>', 95, 1, 1)) Notify only. No action.