Home > developer > content development > Default Monitor Thresholds
The tables below list all of the default monitor thresholds implicitly added in all environments. As an app owner, you should review and update these thresholds to what is best suited for your app.
Monitor Type | Resource Name | Threshold Definition | Description | Action |
---|---|---|---|---|
CPU Load Heartbeat | compute | If collection for any of the load metrics (load1, load5 or load15) is missed, raises a missing heartbeat pulse event which makes the compute instance unhealthy. | Unhealthy notification is raised. Repair action is executed on the affected instance. | |
CPU Load | compute | 'HighLoad' => threshold('1m','avg','load5',trigger('>=',30,3,1),reset('<',15,1,1)) Compute is heavily loaded if the load5 average value goes above 30. Then set the trigger. |
Notify only. No action. | |
CPU Usage | compute | 'HighCpuUsage' =>threshold('5m','avg','CpuIdle',trigger('<=',10,15,2),reset('>',15,15,1)) Compute utilization is very high if cpuidle goes below 10% which means that more than 90% is utilized. |
Notify only. No action. | |
Socket Connection | compute | No default threshold is defined. Monitor can be set up with different State: TIME_OUT , ESTABLISHED , CLOSE_WAIT , etc. |
||
Network | compute | No default threshold is defined. | ||
Filesystem root | volume / | 'LowDiskSpace' => threshold('1m', 'avg', 'space_used', trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)) Compute has low disk space when space_used is more than 90% at root disk. /'LowDiskInode' => threshold('1m', 'avg', 'inode_used', trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)) Compute has low inode when inode_used is more than 90% at root disk / |
Notify only. No action. | |
System messages | file /var/log/messages | |||
Memory | Compute | 'HighMemUse' => threshold('1m', 'avg', 'free', trigger('<', 50000, 5, 4), reset('>', 80000, 5, 4)) Compute is using too much memory when available (free) memory goes lower than 50MB. |
Notify only. No action. | |
Process cron | crond process | 'CrondProcessLow' => threshold('1m', 'avg', 'count', trigger('<', 1, 1, 1), reset('>=', 1, 1, 1)) crond process should be running. If not, the process count goes below 1 and raises the alert. 'CrondProcessHigh' => threshold('1m', 'avg', 'count', trigger('>=', 200, 1, 1), reset('<', 200, 1, 1)) crond process count should not be above 200. If found, raises the alert. |
Notify only. No action. | |
Process sendmail | postfix process | 'PostfixProcessLow' => threshold('1m', 'avg', 'count', trigger('<', 1, 1, 1), reset('>=', 1, 1, 1)) postfix process should be running. If not, the process count goes below 1 and raises the alert. 'PostfixProcessHigh' => threshold('1m', 'avg', 'count', trigger('>=', 200, 1, 1), reset('<', 200, 1, 1)) postfix process count should not be above 200. If found, raised the alert. |
Notify only. No action. | |
Process SSH Daemon | sshd process | 'SshdProcessLow' => threshold('1m', 'avg', 'count', trigger('<', 1, 1, 1), reset('>=', 1, 1, 1)) sshd process should be running. If not, the process count goes below 1 and raises the alert. 'SshdProcessHigh' => threshold('1m', 'avg', 'count', trigger('>=', 200, 1, 1), reset('<', 200, 1, 1)) sshd process count should not be above 200. If found raises the alert. |
Notify only. No action. |
Monitor Type | Resource Name | Threshold Definition | Description | Action |
---|---|---|---|---|
Filesystem /app | volume | 'LowDiskSpaceCritical' => threshold('1m', 'avg', 'space_used', trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)) Volume has low disk space when space_used is more than 90% at root disk /app 'LowDiskInodeCritical' => threshold('1m', 'avg', 'inode_used',trigger('>=', 90, 5, 2), reset('<', 85, 5, 1)), Volume has low inode space when inode_used is more than 90% at root disk /app |
Notify only. No action. |
Monitor Type | Resource Name | Threshold Definition | Description | Action |
---|---|---|---|---|
Tomcat process | tomcat-daemon | 'TomcatDaemonProcessDown' => threshold('1m', 'avg', 'up', trigger('<=', 98, 1, 1), reset('>', 95, 1, 1)) tomcat daemon process is considered down if its process availability goes below 90%. Even though the threshold says below 90%, in reality the process no longer exists. Do not change the average values to 100%. |
Notify only. No action. | |
JvmInfo | tomcat | 'HighMemUse' => threshold('1m','avg', 'percentUsed',trigger('>=',90,5,1),reset('<',85,5,1)) Note: Values are calculated from http://localhost:#{port}/manager/status?XML=true |
||
ThreadInfo | tomcat | 'HighThreadUse' => threshold('5m','avg','percentBusy',trigger('>=',90,5,1),reset('<',85,5,1)) Note: Values are calculated from http://localhost:#{port}/manager/status?XML=true |
||
RequestInfo | tomcat | No Threshold defined. Note: Values are calculated from http://localhost:#{port}/manager/status?XML=true | ||
Log | tomcat | 'CriticalLogException' => threshold('15m', 'avg', 'logtomcat_criticals', trigger('>=', 1, 15, 1), reset('<', 1, 15, 1)) |
||
AppVersion | tomcat |
Monitor Type | Resource Name | Threshold Definition | Description | Action |
---|---|---|---|---|
Exception Monitoring | artifact Level | * Log Path: * /log/logmon/logmon.log * Pattern to look for: Exception * thresholds: 1 (Alert on every occurrence ) * Severity: Major * If more than 2 Critical | 'CriticalLogException' => threshold('15m', 'avg', 'logtomcat_criticals', trigger('>=', 1, 15, 1), reset('<', 1, 15, 1)), 'logfile' => '/log/apache-tomcat/catalina.out', 'warningpattern' => 'WARNING', 'criticalpattern' => 'CRITICAL' The three parameters above define the file to be monitored for warning and critical patterns. |
Notify only. No action. |
Monitor Type | Resource Name | Threshold Definition | Description | Action |
---|---|---|---|---|
ServerStatus | Apache | 'TooBusy' => threshold('5m','avg','idle_workers',trigger('<',5,5,5),reset('>',5,5,5)), 'HighUserCpu' => threshold('5m','avg','cpu_user',trigger('>',60,5,1),reset('<',60,5,1)), 'HighSysCpu' => threshold('5m','avg','cpu_sys',trigger('>',30,5,1),reset('<',30,5,1)) Note: All the metrics are calculated using http://localhost:#{port}/server-status |
Notify only. No action. |
Monitor Type | Resource Name | Threshold Definition | Description | Action |
---|---|---|---|---|
BrokerStatus | activemq | Note: Metrics values are calculated using queues: <protocol>://<host>:<port>/admin/xml/queues.jsp topics: <protocol>://<host>:<port>/admin/xml/topics.jsp |
||
Log | activemq | 'CriticalLogException' => threshold('15m', 'avg', 'logtomcat_criticals', trigger('>=', 1, 15, 1), reset('<', 1, 15, 1)), 'logfile' => '/opt/apache-activemq-5.5.1/data/wrapper.log', 'warningpattern' => 'OutOfMemory', 'criticalpattern' => 'OutOfMemory' The three parameters above define the file to be monitored for warning and critical patterns. Log Path: /log/logmon/logmon.log Pattern to look for: Exception. |
Notify only. No action. | |
Memory | activemq | No threshold defined | 'protocol' => 'http', 'port' => '8161', 'path' => '/admin/index.jsp?printable=true' Note: Metrics values are calculated using <protocol>://<host>:<port>/admin/index.jsp?printable=true |
Notify only. No action. |
Process | Daemon | 'ActiveMQDaemonProcessDown' => threshold('1m', 'avg', 'up', trigger('<=', 98, 1, 1), reset('>', 95, 1, 1)) |
Notify only. No action. |