Monitoring internal SAS disks on Sun servers
Posted on November 30th, 2010
We need to monitor disk health on Sun M4000 machine running Solaris 10. It can be done just fine with smartmontools package.
1. install smartmontools using blastwave package system.
2. check if smart status can be read from disk
bash-3.00# /opt/csw/sbin/smartctl -H /dev/rdsk/c0t0d0s2 smartctl version 5.36 [sparc-sun-solaris2.8] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ SMART Health Status: OK
3. We are going to integrate this check into nagios. So we did a bit of searching and found check_smart.pl plugin. As this plugin was developed with linux in mind we did some corrections and adopted it for Solaris.
4. The plugin uses smartctl, so we need to allow this tool to be run as root while nagios plugin running as nagios user.
We modified Solaris pfexec configuration to run smartctl as root
Create profile for nagios.
file: /etc/security/prof_attr
Nagios:::Nagios Profile:
file: /etc/security/exec_attr
Modify execution attributes
Nagios:suser:cmd:::/opt/csw/sbin/smartctl:uid=0;gid=0;euid=0
Assign Nagios profile to nagios user
usermod -P Nagios nagios
Check it out
bash-3.00# su - nagios Sun Microsystems Inc. SunOS 5.10 Generic January 2005 $ bash bash-3.00$ pfexec /opt/csw/sbin/smartctl -H /dev/rdsk/c0t0d0s2 smartctl version 5.36 [sparc-sun-solaris2.8] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ SMART Health Status: OK
5. Ok, next we modify linux firiendly plugin to run pfexec smartctl instead of sudo.
6. modify nrpe.cfg
command[check_rootdisk]=/usr/local/nagios/libexec/check_smart.pl -e -d /dev/rdsk/c0t0d0s2
Solaris ready check_smart.pl plugin
Filed under Plugins | No Comments »
Monitoring free memory in Solaris
Posted on August 4th, 2010
This post introduces a little plugin for Nagios NMS you can use to monitor free physical memory in Solaris OS. It should be run locally or by nrpe daemon and depends on kstat Solaris command. You can find brief description and usage from plugin file itself. nagios plugin for Solaris free memory monitoring
Tags: monitoring, Nagios, Solaris
Filed under Plugins, Solaris | No Comments »
Monitoring server overheat with ipmitool
Posted on June 22nd, 2010
Recently I was faced with sudden night Sun x4600 server shutdown. Investigation reveals that there was an conditioning failure and machine goes down on its own. I dig into logs and found that the shutdown was initiated by server’s system controller. Excerpt from the log follows:
System ACPI Power State : sys.acpi : S5/G2: soft-off Power Supply : ps1.pwrok : State Deasserted Power Supply : ps3.pwrok : State Deasserted Power Supply : ps2.pwrok : State Deasserted Temperature : p2.t_amb : Upper Non-recoverable going high : reading 46 > threshold 45 degrees C Hot removal of /SYS/PS0 Entity Presence : ps0.prsnt : Device Absent Processor : p0.cardfail : State Asserted Temperature : p0.t_amb : Upper Critical going high : reading 39 > threshold 38 degrees C Processor : p3.cardfail : State Asserted Temperature : p3.t_amb : Upper Critical going high : reading 39 > threshold 38 degrees C Processor : p1.cardfail : State Asserted Temperature : p1.t_amb : Upper Critical going high : reading 39 > threshold 38 degrees C Processor : p2.cardfail : State Asserted Temperature : p2.t_amb : Upper Critical going high : reading 39 > threshold 38 degrees C
This nagios plugin for monitoring server overheat with ipmi will get me informed about that kind of event in the future and eliminate unexpected downtime.
Filed under Plugins, Sun Hardware | No Comments »
Monitoring HDS AMS storage with nagios using SNMP protocol
Posted on June 16th, 2010
We are going to get rid of HDS proprietary hi-track software and use industry standard SNMP protocol to monitor Hitachi midrange storage systems. NMS of choice will be Nagios. There are 2 ways of snmp monitoring in general and both are supported by Nagios. Your required to configure your storage system’s SNMP agent and specify SNMP community and SNMP trap destination. When done, check if it is working by executing check_snmp plugin.
# ./check_snmp -H ams-ctl0 -C public -o sysDescr.0 -P 1 SNMP OK - HITACHI DF600F Ver 0781/A-M |
Ok, you done with SNMP agent configuration, lets begin configure SNMP manager, i.e. Nagios.
1. Active monitoring using plugin executing SNMP GET request for some OID. Here is a small Hitachi AMS storage monitoring nagios plugin which will do the task. It is ready for run with nagios embedded perl interpreter (ePN).
2. Passive monitoring using SNMP traps handling.
First, we need to install NetSNMP project’s snmptrapd daemon and point it to the program which will be handling all the traps coming in. We choose to run snmptt on every trap event. snmptrapd configuration will look like:
traphandle default /usr/sbin/snmptt disableAuthorization yes donotlogtraps yes
Next, we configure snmptt iteslf to give it some understanding of what to do on receiving traps. Open snmptt.ini config and create section [TrapFiles]:
[TrapFiles] snmptt_conf_files = <<END /usr/local/etc/snmptt.conf.AMS500 END
Next, create snmptt.conf.AMS500 by running snmpttconvertmib tool on dfraid.mib which resides on AMS500 SNMP CD.
# export PATH=$PATH:/usr/local/bin # snmpttconvertmib --in=dfraid.mib \ > --out=/usr/local/etc/snmptt.conf.AMS500 \ > --exec='/usr/local/nagios/libexec/eventhandlers/submit_check_result $r TRAP 1'
snmpttconvertmib calls snmptranslate from NetSNMP package not using full path, so you should correct your path to include directory in which snmptranaslate resides. Next, we define TRAP service.
nagios templates.cfg:
define service {
name snmptrap
use generic-service
register 0
service_description TRAP
is_volatile 1
max_check_attempts 1
normal_check_interval 1
retry_check_interval 1
passive_checks_enabled 1
check_period none
check_command check-host-alive
notification_interval 31536000
}And we use this template when defining actual services like this:
define service{
use snmptrap,alltime_sms
host_name amsctl0
}alltime_sms is a host template with defined contact groups, having sms targets in it.
Summary: storage sends trap, snmptrapd daemon handle it by calling snmptt trap handler, snmptt then calls submit_check_result script to submit passive check result to nagios. Nagios dispatches this submission to corresponding host service and takes appropiate action.
Filed under Nagios, Plugins, Storage | No Comments »
Monitoring Sun A1000 Array with nagios and nrpe
Posted on January 21st, 2010
I’ve got legacy Sun E3500 system with Sun A1000 array. It runs Solaris 8. This is how this array could be monitored using nagios NMS.
Read the rest of this entry »
Filed under Plugins, Storage, Sun Hardware | No Comments »
Monitoring correctable memory errors
Posted on January 20th, 2010
If you have Sun Fire server, from time to time you can see that kernel trying to notify you about correctable memory errors. If there are too many errors on a memory module, the kernel removes corresponding physical page from service. If too many pages were removed then dimm must be replaced. You like to monitor CE errors and DIMM status so that you could proactively react on these events.
Read the rest of this entry »
Filed under Plugins, Sun Hardware | No Comments »