Alarm Clock

From Roaring Penguin
Jump to: navigation, search

An email from Cron reports:

  Subject: Cron <root@canit> test -x /usr/share/canit/scripts/canit-system && /usr/share/canit/scripts/canit-system check (failed)
  Body: Alarm clock


canit-system check

Cron job /etc/cron.d/watch-canit runs once a minute:

  * * * * * root test -x /usr/share/canit/scripts/canit-system && /usr/share/canit/scripts/canit-system check

In normal operation this should return very quickly, e.g. less than a second. If this process times out that can generate this error.

If:

  1. The server on which canit-system check was run has a high load, or;
  2. The database server that the canit system uses is very unresponsive

... these could cause this error, so in troubleshooting this error it is good to check the load on both the server that reported the error and the CanIt system's DBServer.

The cron job runs once a minute, so if the error is ongoing/chronic (heh) then you would see several of these. If on the other hand only a single instance of this was reported then it is most likely just a transient error that can be ignored.

It is good to check the following:

md-mx-ctrl load and md-mx-ctrl hload on the server that reported it, as well as running time /etc/init.d/canit-system check by hand to see how long it takes to respond.

We also have another article with more extensive instructions on Performance Troubleshooting.

NOTE

The devel. team reports that most conditions which throw an alarm() are wrapped in eval{...}s and would generate a more useful error message, so it is unclear exactly what condition(s) cause this bare/untrapped alarm() warning.