nagios plugins

This is a collection of nagios plugins, and other tools, I've written. They're usually not pretty, but you might find them useful.

Most of them are intended to run on Centos Linux, or Sun Solaris 5.8 through 5.11, with Nagios 2.1 through 4.1. Most of them are in Perl.

I license anyone to copy, use and distribute them under the terms of the Gnu Public License.

ack_matching
ack all hosts and services that match a pattern
ack_matching.cgi
cgi to ack all hosts and services that match a pattern
cancel_downtime_matching
cancel downtime on all hosts and services that match a pattern
cancel_downtime_matching.cgi
cgi to cancel downtime on all hosts and services that match a pattern
check_3510
Check a sun 3510 disk array for faults. Runs on the FC connected Sun server.
check_6140
Check a Sun 6140 drive array. This uses the Sun sscs command, so it's very slow. It can often take 2 minutes to run. I use it under the passive_checks command below.
check_age.c
Check a file's age. Useful for things that are supposed to get updated on a regular basis: logs, transfer files, etc.
check_airtalk
Check the output of an Airtalk uM260 monitoring device. This is a very specialized monitoring unit for air-pressurized telco cables. It's designed to send alarms to Airtalk's central alarm software. However, it only sends threshold crossings. If you want to know about bad/open/shorted transducers, you need to poll it. It will also warn when you're within 5 percent of crossing the threshold.

This depends on another program, airtalk_get_readings for actually retrieving the readings, under cron. Also see airtalk_get_one_reading, and airtalk.cron.

check_akcp_sp2
Check an AKCP environmental probe for temperature or humidity.
check_apc_ups
Check an APC UPS.
check_aruba
Check an Aruba Wireless controller or AP.

This needs the UD::SNMP perl module.

check_bgp_neighbor
Check BGP neighbors on a router.
check_bmc
Check the Baseband Management Controller on a Sun Solaris i86pc system for hardware faults.
check_clearpass
Check an Aruba Clearpass server/appliance.
check_cluster_livestatus
Check an a "cluster" of other nagios service checks via livestatus.
check_conman
Check TWS/Maestro conman status. Used by check_maestro. Uses a config file
check_dell_idrac
Check a Dell iDrac service processor via IPMI.
check_dell_idrac8
Check a Dell iDrac service processor via IPMI.
check_dhcp_leases
Check on a DHCP sharednet's address utilization.
check_dircount.c
Count the files in a directory. Useful for monitoring mail or print spools.
check_disk_error_rates
Check iostat -e for new disk errors.
check_elom
Check a Sun Embedded Lights Out Management service processor for hardware faults. Uses snmp over the management network interface.
check_emerson_netsure
Check on an Emerson Netsure -48V DC power supply.
check_exchange_health.pl
Check Exchange mail server health.
check_filesize
Monitor the size of a file, ie a log file.
check_fmdump
Check the output of the Solaris 10 or 11 fault manager daemon.
check_google_spf
Check Google SPF record.
check_grouper
Check Tomcat grouper status page.
check_hdadm
Check Sun Fishworks hdadm drive smart status.
check_host_via_ping_service
Check a host's up/down status by examining the status of recent ping service attempts to the host. The normal host check is just a ping. This lets you bypass that and use the result of your last ping service check.
check_hrStorage
Check a remote filesystem via SNMP hrStorage.
check_https_content
The normal nagios http plugin can't check the https ssl certificate, and the page content at the same time. This wrapper runs it twice to do the two tests.
check_ifs_up
Check if a router's interfaces are up.
check_ilom
Check a Sun ilom service processor.
check_ipf
Check Sun ipfilter status.
check_iptables
Check Linux iptables status.
check_juniper_firmware
Check a Juniper router or switch's firmware, including backup boot.
check_juniper_lacp
Check LACP protocol and LAG links on a Juniper router or switch.
check_juniper_optics
Check sfp/sfp+/xfp optics a Juniper router or switch, including low RX optical power.
check_license_file
Check flexlm license file.
check_liebert_crac
Check a Liebert computer room air conditioner.
check_liebert_ups
Check a Liebert UPS.
check_lmstat
Run lmstat and check output for errors.
check_log_rate
Check how fast a log file is growing.
check_maestro
Check TWS/Maestro. Uses check_conman.
check_memory_error_rates
Check Solaris memory error rates, from dmesg.
check_metastat
Check Sun Solaris metastat output for software raid errors.
check_mounts
Check all filesystems are properly mounted. Sometimes, when Linux sees enough disk errors, it will switch a mount to read-only.
check_mqueues.sh
Check a bunch of mail queues.
check_nagios_latency
Check Nagios server's latency.
check_netapp
Check a NetApp filer.
check_netdev
Check a few things on a generic snmp network device.
check_nf_conntrack
Check Linux number of iptables connection tracking state entries.
check_nrpe_via
Check an NRPE command on a remote host, via an intermediate host. This lets you "jump" an nrpe query through a "proxy" host, that you normally can't connect with.
check_ntp
Check NTP server.
check_orca
Check orca rrds are updating properly.
check_powerdsine
Check a PowerDsine midspan power over ethernet injector models 6000 and 6500. Note, 6000s often get confused and stop responding to snmp.
check_proc_pgrep
Check a process using pgrep.
check_prtdiag
Check Sun Solaris prtdiag -v output for hardware faults. prtdiag's output was intended for use by humans, and has no consistant format. It's different on almost every combination of platform, os revision, and patch level. So this is a big, ugly pile of regexps.
check_ps_multi.pl
Check on a bunch of processes. Uses a remote config file
check_sakai_login
Log into sakai (java based course management software), check the page, log back out, complain if it takes too long.
check_sonet_errors
Check a Cisco/Cerent 15454 for sonet errors, via snmp.
check_stp
Check on a switch's spanning tree.
check_switch
Check a bunch of things about a network switch, or router.
check_sysUpTime
Check snmp sysUpTime, and complain if the device rebooted.
check_systemd
Check if Linux systemd has any errors.
check_temp_snmp
Check Cisco router and switch temperature sensors.
check_tftp
Check a TFTP server.
check_x24
Check an Xytratex x24 disk array controller.
check_xscf
Check an m4000/m5000 xscf service processor.
check_yum
Run yum check-update to look for available Linux updates.
check_zpool
Check ZFS disk pool.
count_log_lines.c
Count the lines added to a log file in the last 5 minutes. This works by seeking back about 1MB from the end of the file, and parsing the dates in the log file lines. When you've got a multi-gigabyte log file, this is much faster than wc.
downtime_matching
Schedule downtime on matching hosts and services, cli version.
downtime_matching.cgi
Schedule downtime on matching hosts and services, web cgi version.
gen_flat_status_page
Generate a single, flat, simple, html page with all current nagios status info. Having multiple workstations display this is much easier on the server than having them all use the status cgi or thruk.
immediately_check_matching
Schedule an immediate check of matching hosts and services.
immediately_check_service_on_all_hosts
Send commands to nagios to immediately check a specified service on all hosts where it's defined.
log_ctime
Handy utility to convert the nagios log timestamps from seconds since epoch to something more human readable. Reads from stdin, writes to stdout.
	grep myhost var/nagios.log | log_ctime 
    
notify_by_epager
Rate limit pager notifications, so a power outage doesn't send 400 pages to the oncall guy.
passive_checks
Some checks take too long to run from inside nagios. This runs a configurable list of plugins, and feeds the result to nagios as passive checks via command pipe or NSCA.

Uses a config file

rm_matching_comments
Bulk remove comments that match criteria, cli version.
rm_matching_comments.cgi
Bulk remove comments that match criteria, web cgi version.
show_service_output
Show matching hosts/services with plugin output.