Overheating boxes…

So apparently sometimes doing ‘stupid things(tm)’ can overheat your disks, and your box… So you should always keep an eye on the temperature.

So for my benefit more so in the future, and maybe others, here is a quick script to check the temperature of the processors, disk, and any changes in /var/log/messages to see what is going on.  I guess I should make it more modular, and not hardcode stuff, but here we are.

#!/bin/bash
#
#
# Read the disk temperature

disk="smartctl -d ata -A /dev/sda  | grep Temperature_Celsius | awk '{print \$10}'"
diskt=$(eval $disk)

if [ "$diskt"  -gt 40 ];
	then
	error=$"Disk temperature is hotter than 40c it's now $diskt\n"
	else
#	error=$"Disk temperature is fine, it's $diskt\n"
	:
	fi

sensors| grep Core|awk '{print $3}'>/tmp/dat.txt

j=0

while read line
do
number="echo $line |sed 's/\.
#!/bin/bash
#
#
# Read the disk temperature
disk="smartctl -d ata -A /dev/sda  | grep Temperature_Celsius | awk '{print \$10}'"
diskt=$(eval $disk)
if [ "$diskt"  -gt 40 ];
then
error=$"Disk temperature is hotter than 40c it's now $diskt\n"
else
#	error=$"Disk temperature is fine, it's $diskt\n"
:
fi
sensors| grep Core|awk '{print $3}'>/tmp/dat.txt
j=0
while read line
do
number="echo $line |sed 's/\.\0\°C//g'|sed 's/\+//g'"
cpu=$(eval $number)
if [ "$cpu" -gt 82 ];
then
error=$"$error\nCPU core $j temperature is $cpu"
else
#error=$"$error\nCPU core $j temperature is $cpu"
:
fi
j=$(($j+1))
done < /tmp/dat.txt rm -f /tmp/dat.txt if [[ -f /tmp/messages.1 ]]; 	then 	tail /var/log/messages > /tmp/messages.2
dstring="diff /tmp/messages.1 /tmp/messages.2" 
logadd=$(eval $dstring)
if [ ! -z "$logadd" ];
then
error=$"$error\n\n$logadd"
else
:
fi
mv /tmp/messages.2 /tmp/messages.1
else
tail /var/log/messages > /tmp/messages.1
fi
if [ ! -z "$error" ];
then
echo "there are issues.."
echo -e $error > /tmp/message.tmp
mail your_name@your_domain.com -s "errors on machine_name" 
\°C//g'|sed 's/\+//g'" cpu=$(eval $number) if [ "$cpu" -gt 82 ]; then error=$"$error\nCPU core $j temperature is $cpu" else #error=$"$error\nCPU core $j temperature is $cpu" : fi j=$(($j+1)) done < /tmp/dat.txt rm -f /tmp/dat.txt if [[ -f /tmp/messages.1 ]]; then tail /var/log/messages > /tmp/messages.2 dstring="diff /tmp/messages.1 /tmp/messages.2" logadd=$(eval $dstring) if [ ! -z "$logadd" ]; then error=$"$error\n\n$logadd" else : fi mv /tmp/messages.2 /tmp/messages.1 else tail /var/log/messages > /tmp/messages.1 fi if [ ! -z "$error" ]; then echo "there are issues.." echo -e $error > /tmp/message.tmp mail your_name@your_domain.com -s "errors on machine_name"

Of course, it can and should be expanded to check up on things like SMART disk errors, and other things going on.  And of course in the crontab, something like:

*/5  *    *   *   *   /root/report.sh

To run it every five minutes.  As always it’s lacking comments, full pathing to executables, and much of anything to keep it safe.  I’m sure if I was smart I could read more from pipes and variables, but I’m old so I read from files.  If you were looking for the bash shell script expert, it’s not me. lol

7 thoughts on “Overheating boxes…

  1. I’m surprised you don’t have an Icinga server knocking around to alert you if this becomes a problem, the problem with running one at work for everything is that I don’t check legs ever unless im diagnosing a problem that Icinga has already warned me about :-/.

    Oh, I did get that Alpha machine built, just having a nasty problem getting X running on it (and I think the Nouveau driver in Gentoo won’t work out the box!)

    • Its just one box, and on the other side of the planet…. But it does sound like a good idea, although if something does go wrong, there isn’t much I can do about it from here.

      • I got Windows NT 4 on it quite nicely, FX!32 is a bit of a disappointment but I think I built it up too much over the years in my head.

        I just wanted to play around with Linux on an alpha and only a crappy bug in X is stopping me, an older version would work most likely, but I am running a very up to date version of Gentoo. (only computer in my house running kernel 4!)

        • fx32 was really amazing cross of old and new tech. At its heart was an intrepeter, a profiler, and an instruction translator. The first run is always the worst as the intrepeter is slow, and the profiler looks for frequently executed blocks, which are saved to disk then translated after the fact. It was cool, but yes exe + dll code ended up being at least 2x the size. The real bummer is that you couldnt mix alpha and x86 dll’s, so plenty of runtimes had to be installed twice.

          But we live in a ghz powered jit world where doing the recompile is fast enough on the fly.

          The funny thing for me is now anything id want on the alpha i either have binary or source. Msvc 6 on the alpha rocked. But sadly the closed nature of dec making processors, chipsets, BIOS and boards wasnt going to give us taiwaneese dual, quad, or octa boards, otherwise who was going to buy them?

          I still am amazed by MS’s terra server that pre-dated google earth in those critical 97 internet years, but MS couldnt leverege it, just as DEC did nothing of any signifigance with alta-vista.

  2. smartmontools (the package that smartctl comes from) already has some kind of tool for periodically checking for issues reported by SMART and emailing you, “smartd”. I don’t know much about it though, Linux distributions tend to just have it pre-configured.

Leave a Reply