Here’s an easy way to get a simple, short, daily email with the load averages for your server (calculated since the last time the server was rebooted).
1. LOG IN via SSH as ROOT.
2. Run this command to create a new file called “dailyloadstat.sh”
nano ~/dailyloadstat.sh
3. Insert the following lines into the new file you just created:
iostat -c > LoadStats
uptime >> LoadStats
mail -s “Average Load Report” you@email.com < LoadStats
rm -f LoadStats
4. Save and Exit the file you were editing. Next, change the file’s permissions to make it executable:
chmod 755 ~/dailyloadstat.sh
5. Now, just create a cron job to run this script every day for you. To run it every day at 4:00 a.m., you’d use the following line in your cron file:
0 4 * * * ~/dailyloadstat.sh
EXPLANATION
The script above puts the output from two commands, “iostat” and “uptime”, into a temporary file called “LoadStats”, then emails that file to the address you choose before deleting it from the server.
The “iostat” command is used for monitoring input/output devices, but it also provides a nice CPU usage report which shows load averages in a number of categories. The best part about this command is that, unlike “top” or “uptime”, the averages given are for the period of time since the server was last restarted. Since most servers are rarely restarted unless there’s a problem, this gives you a nice way to get a gauge of how heavily your server is stressed on a normal, everyday basis. The report shows these categories:
%user – average CPU% spent on user-level processes (applications)
%nice – avg. CPU% of user processes with “nice” priority
%system – avg. CPU% spent on system (kernel) processes
%iowait – % of time CPU was idle due to it waiting on the hard drive
%steal – % of time CPU was idle due to another virtual processor
%idle – % of time CPU was idle with nothing to do.
All of these percentages should add up to 100%. In general, the higher your percentage of %idle, the better because it means your system is not taxed to capacity (my %idle is usually around 98.5 – meaning my server is far from being overworked). A high number in any category except %idle should be investigated to find out why the system is overloaded.
Also included in the report is the “uptime” command, which shows just how long your server has been running since last reboot and also gives your average CPU load over the past 1, 5, and 15 minutes respectively to give you an idea of much more recent server usage, although I personally rarely find this useful (look for a more in-depth article about server load to be coming soon).
Do you have other metrics that you monitor frequently on your server? Have a better suggestion for passively monitoring server load? Let us all know in the comments!