Difference between revisions of "Alarm Clock"

From Roaring Penguin
Jump to: navigation, search
(Create page for cron canit-system check "Alarm Clock" report)
 
(canit-system check)
 
Line 25: Line 25:
 
<code>md-mx-ctrl load</code> and <code>md-mx-ctrl hload</code> on the server that reported it, as well as running <code>time /etc/init.d/canit-system check</code> by hand to see how long it takes to respond.
 
<code>md-mx-ctrl load</code> and <code>md-mx-ctrl hload</code> on the server that reported it, as well as running <code>time /etc/init.d/canit-system check</code> by hand to see how long it takes to respond.
  
 +
We also have another article with more extensive instructions on [[Performance Troubleshooting]].
  
 
== NOTE ==
 
== NOTE ==
  
 
The devel. team reports that most conditions which throw an alarm() are wrapped in eval{...}s and would generate a more useful error message, so it is unclear exactly what condition(s) cause this bare/untrapped alarm() warning.
 
The devel. team reports that most conditions which throw an alarm() are wrapped in eval{...}s and would generate a more useful error message, so it is unclear exactly what condition(s) cause this bare/untrapped alarm() warning.

Latest revision as of 09:27, 22 September 2017

An email from Cron reports:

  Subject: Cron <root@canit> test -x /usr/share/canit/scripts/canit-system && /usr/share/canit/scripts/canit-system check (failed)
  Body: Alarm clock


canit-system check

Cron job /etc/cron.d/watch-canit runs once a minute:

  * * * * * root test -x /usr/share/canit/scripts/canit-system && /usr/share/canit/scripts/canit-system check

In normal operation this should return very quickly, e.g. less than a second. If this process times out that can generate this error.

If:

  1. The server on which canit-system check was run has a high load, or;
  2. The database server that the canit system uses is very unresponsive

... these could cause this error, so in troubleshooting this error it is good to check the load on both the server that reported the error and the CanIt system's DBServer.

The cron job runs once a minute, so if the error is ongoing/chronic (heh) then you would see several of these. If on the other hand only a single instance of this was reported then it is most likely just a transient error that can be ignored.

It is good to check the following:

md-mx-ctrl load and md-mx-ctrl hload on the server that reported it, as well as running time /etc/init.d/canit-system check by hand to see how long it takes to respond.

We also have another article with more extensive instructions on Performance Troubleshooting.

NOTE

The devel. team reports that most conditions which throw an alarm() are wrapped in eval{...}s and would generate a more useful error message, so it is unclear exactly what condition(s) cause this bare/untrapped alarm() warning.