Monitor Dell Server Hardware in Nagios with SNMP v3

RHEL

Instructions can be found here by selecting the version of DOM you need to install: http://linux.dell.com/repo/hardware/ The following steps worked for Dell PowerEdge 1950 and  PE 2950.

THIS DOES NOT WORK FOR PE 2550’s.  You need an older version of DOM to monitor these. For PE 2250’s I logged into support.dell.com, entered serial number and downloaded the appropriate version of OM for that hardware.

Add Dell Repository to your Server

wget -q -O - http://linux.dell.com/repo/hardware/OMSA_6.3/bootstrap.cgi | bash

Install DOM on Remote Server

Now you can install srvadmin-all with the yum command: yum install srvadmin-all

Restart snmpd service: sudo /sbin/service snmpd restart

SuSE 10. i386

Download the tarball from Dell Support

Extract the tarball: tar xvzf OM-SrvAdmin-Dell-Web-LX-XXX.tar.gz

Install the services: sudo sh linux/srvadmin-install.sh

Follow the prompts to install. I had an issue installing all so I just selected the necessary packages.

Restart snmpd: sudo /sbin/service snmpd restart

Troubleshooting

If the ipmi service doesn’t start then you will not get system specific SNMP variables returned.  I found a solution here:

http://lists.us.dell.com/pipermail/linux-poweredge/2008-October/037701.html

The /sbin/start_udev command worked in creating the /dev/ipmi0 character device

Cacti SNMP Disk IO

Taken from here: http://www.goldfisch.at/knowwiki/howtos/cacti

disk I/O

http://forums.cacti.net/about8777-0-asc-0.html is the place to start. There are 7 pages of discussions about things that work or dont work. It took me 5 days to get it working, so I try to make it shorter next time.

  1. check if your host even delivers snmp-DISKIO-data at all:

    snmpwalk -v1 -c COMMUNITYNAME  HOST-IP .1.3.6.1.4.1.2021.13.15.1.1.1

    should return a number of DISKIO-MIB-lines. If it does not you need to rebuild net-snmp with diskio-module. Modern ubuntus at least havediskio-module precompiled in their snmp-versions

  2. download diskio.tgz (all credits to gandalf and rodre in the above thread. I just put a copy of their work on my server.
  3. extract the archive
  4. put net-snmp_devio.xml to /usr/share/cacti/site/resource/snmp_queries/ or whereever your cacti-installation keeps its xml-query-files. (Its the same folder where net-snmp_disk.xml is located)
  5. import cacti_data_query_ucdnet_device_io.xml
  6. for each device you want to monitor diskIO you go to the device-menue and under “associated data-query” you choose and add the following data-query ucd/net Device I/O.
  7. still in the devive-menu you goto “create graphs for this host” and at the very bottom of the page !! you find all your devices and can select them and select graph-type (bytes read/write – average load – reads/writes) and create a load of graphs. If you have more then 20 devices then the next-button will help you to find the other ones 🙂
  8. wait 5 minutes to be sure the poller has already collected data before trying to view a graph or panicing on some debug-output that claims about missing rrd-files. They are created when the poller is executed first
  9. if things go wrong, then execute verbose query in the associcated data-queries-section of your device and same in other places like graph-managment and then read the 7 pages in the above mentioned link to possibly find your problem !!

Monitor HP Proliant DL360 on Nagios and SNMP v3

RHEL5

Download hp-health.XXX.rpm from here

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&swItem=MTX-83c9772afe784cb4b0bad42f57&refresh=true

Download hp-snmp-agents from here:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=452749&prodNameId=3288142&swEnvOID=4006&swLang=8&mode=2&taskId=135&swItem=MTX-f0a7ddbd9a1b4be4acc735a541

RHEL4

These instructions assume you already have SNMP configured for version 3 on RHEL4 HP Proliant DL 360 server and another server with Nagios installed and working.

Download and Install HP RPMs

Necessary RPMs are hp-health and hp-snmp-agents.

You can go to the below link to download hp-health for RHEL4 x86 directly: http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=452749&prodNameId=3288142&swEnvOID=2025&swLang=8&mode=2&taskId=135&swItem=MTX-11651fcb8d1b4b3fb224959c4e

You can go to the below link to download hp-snmp-agents for RHEL4 x86 directly:

http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=15351&prodSeriesId=452749&prodNameId=3288142&swEnvOID=2025&swLang=8&mode=2&taskId=135&swItem=MTX-15f072a096134b8397225e4612

First install the hp-health RPM: rpm -ivh hp-health-XXX.rpm

Start the service: service hp-health start

Install the hp-snmp-agents RPM: rpm -ivh hp-snmp-agents.XXX.rpm

Start the service: service hp-snmp-agents start

Edit SNMP file to add the HP MIB’s to net-snmp: dlmod cmaX /usr/lib/libcmaX.so (for 64-bit) dlmod cmaX /usr/lib64/libcmaX64.so

Configure snmp v3 user: net-snmp-config –create-snmpv3-user -ro -a AuthPasswd -x PrivPasswd -X AES -A SHA snmpuser

Restart SNMP service: service net-snmp restart

Verify you get a response from snmpwalk from your Nagios monitoring server:

snmpwalk -l authPriv -u “snmpuser” -X “PrivPassword” -A “AuthPassword” -a SHA -x AES -v 3 “HOST IP” 1.3.6.1.4.1.232.6.2.6.8.1.3 (HP specific OID)

Add Nagios Service Check

Now the issue is to edit a Nagios check to receive SNMP version 3 parameters … I downloaded the check_hpasm from Nagios exchange.  Installed it on my Nagios server.  From the command line made sure it worked by calling check_hpasm with the following options: ./check_hpasm -H HOSTNAME/IP -P 3 –username snmpuser –authpassword snmppassword

This should work with the correct username and password.

Add this to your commands.cfg file

# ‘check_hpasm_v3’ command definition
define command {
command_name    check_hpasm_v3
command_line    $USER1$/check_hpasm -H $HOSTADDRESS$ -P 3 –username $ARG1$ –authpassword $ARG2$
}

# ‘check_hpasm_v3’ command definitiondefine command {        command_name    check_hpasm_v3        command_line    $USER1$/check_hpasm -H $HOSTADDRESS$ -P 3 –username $ARG1$ –authpassword $ARG2$        }

Add this to a service check .cfg file:

define service{

use                      linux-service

host_name                g05

service_description      HPASM SNMP v3 Check

check_command            check_hpasm_v3!snmpuser!AuthPasswd

normal_check_interval    5

retry_check_interval     1

}

Make sure to run a Nagios configuration check before restarting.


Nagios on OpenSuSE

Followed the instructions listed here: http://nagios.sourceforge.net/docs/3_0/quickstart-opensuse.html

Here are the changes I made to get Nagios to play with Apache2 installed via source.

#1 Make sure to add the web server user to the nagcmd group.  Mine wasn’t wwwrun

#2 I created a symbolic link from the directory nagios was extracted to as follows: ln -s nagios-X.X.X nagios

#3 Once you run the ./configure –with-command-group=nagcmd the summary for my config was showing the Apache2 conf.d directory under /etc/apache2/conf.d since Apache was installed via source, it was located elsewhere on the system.  I accepted the defaults and chose to make all anyway.

Then I started up again at make install, make install-init, make install-commandmode, make install-config

In addition you can run make install-webconf and this will but the Apache nagios.conf it the Apache conf.d directory, you will want to copy this to the location of your Apache extra files.

I actually had multiple virtual hosts on my server so I had to copy the nagios.conf text into a subsection of a virtual host and then reload Apache. service httpd reload

Configured Nagios according to the documentation and started it, added it into chkconfig –add