Akamit Blog

Enterprise sysadmin's blog

  • You are here: 
  • Home
  • Plugins

Monitoring internal SAS disks on Sun servers

Posted on November 30th, 2010

We need to monitor disk health on Sun M4000 machine running Solaris 10. It can be done just fine with smartmontools package.

1. install smartmontools using blastwave package system.
2. check if smart status can be read from disk

bash-3.00# /opt/csw/sbin/smartctl -H  /dev/rdsk/c0t0d0s2 
smartctl version 5.36 [sparc-sun-solaris2.8] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
SMART Health Status: OK

3. We are going to integrate this check into nagios. So we did a bit of searching and found check_smart.pl plugin. As this plugin was developed with linux in mind we did some corrections and adopted it for Solaris.
4. The plugin uses smartctl, so we need to allow this tool to be run as root while nagios plugin running as nagios user.
We modified Solaris pfexec configuration to run smartctl as root
Create profile for nagios.
file: /etc/security/prof_attr

Nagios:::Nagios Profile:

file: /etc/security/exec_attr
Modify execution attributes

 Nagios:suser:cmd:::/opt/csw/sbin/smartctl:uid=0;gid=0;euid=0

Assign Nagios profile to nagios user

usermod -P Nagios nagios

Check it out

bash-3.00# su - nagios
Sun Microsystems Inc.   SunOS 5.10      Generic January 2005
$ bash
bash-3.00$ pfexec /opt/csw/sbin/smartctl -H /dev/rdsk/c0t0d0s2
smartctl version 5.36 [sparc-sun-solaris2.8] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
 
SMART Health Status: OK

5. Ok, next we modify linux firiendly plugin to run pfexec smartctl instead of sudo.
6. modify nrpe.cfg

command[check_rootdisk]=/usr/local/nagios/libexec/check_smart.pl -e -d /dev/rdsk/c0t0d0s2

Solaris ready check_smart.pl plugin

Filed under Plugins | No Comments »

Monitoring free memory in Solaris

Posted on August 4th, 2010

This post introduces a little plugin for Nagios NMS you can use to monitor free physical memory in Solaris OS. It should be run locally or by nrpe daemon and depends on kstat Solaris command. You can find brief description and usage from plugin file itself. nagios plugin for Solaris free memory monitoring

Tags: , ,
Filed under Plugins, Solaris | No Comments »

Monitoring server overheat with ipmitool

Posted on June 22nd, 2010

Recently I was faced with sudden night Sun x4600 server shutdown. Investigation reveals that there was an conditioning failure and machine goes down on its own. I dig into logs and found that the shutdown was initiated by server’s system controller. Excerpt from the log follows:

System ACPI Power State : sys.acpi : S5/G2: soft-off
Power Supply : ps1.pwrok : State Deasserted
Power Supply : ps3.pwrok : State Deasserted
Power Supply : ps2.pwrok : State Deasserted
Temperature : p2.t_amb : Upper Non-recoverable going high : reading 46 > threshold 45 degrees C
Hot removal of /SYS/PS0
Entity Presence : ps0.prsnt : Device Absent
Processor : p0.cardfail : State Asserted
Temperature : p0.t_amb : Upper Critical going high : reading 39 > threshold 38 degrees C
Processor : p3.cardfail : State Asserted
Temperature : p3.t_amb : Upper Critical going high : reading 39 > threshold 38 degrees C
Processor : p1.cardfail : State Asserted
Temperature : p1.t_amb : Upper Critical going high : reading 39 > threshold 38 degrees C
Processor : p2.cardfail : State Asserted
Temperature : p2.t_amb : Upper Critical going high : reading 39 > threshold 38 degrees C

This nagios plugin for monitoring server overheat with ipmi will get me informed about that kind of event in the future and eliminate unexpected downtime.

Filed under Plugins, Sun Hardware | No Comments »

Monitoring HDS AMS storage with nagios using SNMP protocol

Posted on June 16th, 2010

We are going to get rid of HDS proprietary hi-track software and use industry standard SNMP protocol to monitor Hitachi midrange storage systems. NMS of choice will be Nagios. There are 2 ways of snmp monitoring in general and both are supported by Nagios. Your required to configure your storage system’s SNMP agent and specify SNMP community and SNMP trap destination. When done, check if it is working by executing check_snmp plugin.

# ./check_snmp -H ams-ctl0 -C public -o sysDescr.0 -P 1
SNMP OK - HITACHI  DF600F           Ver 0781/A-M |

Ok, you done with SNMP agent configuration, lets begin configure SNMP manager, i.e. Nagios.

1.  Active monitoring using plugin executing SNMP GET request for some OID. Here is a small Hitachi AMS storage monitoring nagios plugin which will do the task. It is ready for run with nagios embedded perl interpreter (ePN).

2. Passive monitoring using SNMP traps handling.
First, we need to install NetSNMP project’s snmptrapd daemon and point it to the program which will be handling all the traps coming in. We choose to run snmptt on every trap event. snmptrapd configuration will look like:

traphandle default /usr/sbin/snmptt
disableAuthorization yes
donotlogtraps  yes

Next, we configure snmptt iteslf to give it some understanding of what to do on receiving traps. Open snmptt.ini config and create section [TrapFiles]:

[TrapFiles]
snmptt_conf_files = <<END
/usr/local/etc/snmptt.conf.AMS500
END

Next, create snmptt.conf.AMS500 by running snmpttconvertmib tool on dfraid.mib which resides on AMS500 SNMP CD.

# export PATH=$PATH:/usr/local/bin
# snmpttconvertmib --in=dfraid.mib \
> --out=/usr/local/etc/snmptt.conf.AMS500 \
> --exec='/usr/local/nagios/libexec/eventhandlers/submit_check_result $r TRAP 1'

snmpttconvertmib calls snmptranslate from NetSNMP package not using full path, so you should correct your path to include directory in which snmptranaslate resides. Next, we define TRAP service.
nagios templates.cfg:

define service {
   name                    snmptrap
   use                     generic-service
   register                0
   service_description     TRAP
   is_volatile             1
   max_check_attempts      1
   normal_check_interval   1
   retry_check_interval    1
   passive_checks_enabled  1
   check_period            none
   check_command           check-host-alive
   notification_interval   31536000
}

And we use this template when defining actual services like this:

define service{
        use                             snmptrap,alltime_sms
        host_name                   amsctl0
        }

alltime_sms is a host template with defined contact groups, having sms targets in it.

Summary: storage sends trap, snmptrapd daemon handle it by calling snmptt trap handler, snmptt then calls submit_check_result script to submit passive check result to nagios. Nagios dispatches this submission to corresponding host service and takes appropiate action.

Filed under Nagios, Plugins, Storage | No Comments »

Monitoring Sun A1000 Array with nagios and nrpe

Posted on January 21st, 2010

I’ve got legacy Sun E3500 system with Sun A1000 array. It runs Solaris 8. This is how this array could be monitored using nagios NMS.
Read the rest of this entry »

Filed under Plugins, Storage, Sun Hardware | No Comments »

Monitoring correctable memory errors

Posted on January 20th, 2010

If you have Sun Fire server, from time to time you can see that kernel trying to notify you about correctable memory errors. If there are too many errors on a memory module, the kernel removes corresponding physical page from service. If too many pages were removed then dimm must be replaced. You like to monitor CE errors and DIMM status so that you could proactively react on these events.
Read the rest of this entry »

Filed under Plugins, Sun Hardware | No Comments »