nagios plugins
This is a collection of nagios plugins, and other tools, I've written.
They're usually not pretty, but you might find them useful.
Most of them are intended to run on Centos Linux, or Sun Solaris 5.8
through 5.11, with Nagios 2.1 through 4.1. Most of them are in Perl.
I license anyone to copy, use and distribute them under the terms of the
Gnu Public License.
- ack_matching
- ack all hosts and services that match a pattern
- ack_matching.cgi
- cgi to ack all hosts and services that match a pattern
- cancel_downtime_matching
- cancel downtime on all hosts and services that match a pattern
- cancel_downtime_matching.cgi
- cgi to cancel downtime on all hosts and services that match a pattern
- check_3510
- Check a sun 3510 disk array for faults. Runs on the FC connected Sun server.
- check_6140
- Check a Sun 6140 drive array. This uses the Sun sscs command, so it's very slow. It can often take 2 minutes to run. I use it under the passive_checks command below.
- check_age.c
- Check a file's age. Useful for things that are supposed to get
updated on a regular basis: logs, transfer files, etc.
- check_airtalk
- Check the output of an Airtalk uM260 monitoring device. This is a
very specialized monitoring unit for air-pressurized telco cables.
It's designed to send alarms to Airtalk's central alarm software.
However, it only sends threshold crossings. If you want to know about
bad/open/shorted transducers, you need to poll it. It will also warn
when you're within 5 percent of crossing the threshold.
This depends on another program,
airtalk_get_readings for actually
retrieving the readings, under cron. Also see
airtalk_get_one_reading, and
airtalk.cron.
- check_akcp_sp2
- Check an AKCP environmental probe for temperature or humidity.
- check_apc_ups
- Check an APC UPS.
- check_aruba
- Check an Aruba Wireless controller or AP.
This needs the UD::SNMP perl module.
- check_bgp_neighbor
- Check BGP neighbors on a router.
- check_bmc
- Check the Baseband Management Controller on a Sun Solaris i86pc system
for hardware faults.
- check_clearpass
- Check an Aruba Clearpass server/appliance.
- check_cluster_livestatus
- Check an a "cluster" of other nagios service checks via livestatus.
- check_conman
- Check TWS/Maestro conman status. Used by check_maestro.
Uses a config file
- check_dell_idrac
- Check a Dell iDrac service processor via IPMI.
- check_dell_idrac8
- Check a Dell iDrac service processor via IPMI.
- check_dhcp_leases
- Check on a DHCP sharednet's address utilization.
- check_dircount.c
- Count the files in a directory. Useful for monitoring mail or print spools.
- check_disk_error_rates
- Check iostat -e for new disk errors.
- check_elom
- Check a Sun Embedded Lights Out Management service processor for hardware
faults. Uses snmp over the management network interface.
- check_emerson_netsure
- Check on an Emerson Netsure -48V DC power supply.
- check_exchange_health.pl
- Check Exchange mail server health.
- check_filesize
- Monitor the size of a file, ie a log file.
- check_fmdump
- Check the output of the Solaris 10 or 11 fault manager daemon.
- check_google_spf
- Check Google SPF record.
- check_grouper
- Check Tomcat grouper status page.
- check_hdadm
- Check Sun Fishworks hdadm drive smart status.
- check_host_via_ping_service
- Check a host's up/down status by examining the status of recent ping
service attempts to the host. The normal host check is just a ping.
This lets you bypass that and use the result of your last ping service
check.
- check_hrStorage
- Check a remote filesystem via SNMP hrStorage.
- check_https_content
- The normal nagios http plugin can't check the https ssl certificate,
and the page content at the same time. This wrapper runs it twice to
do the two tests.
- check_ifs_up
- Check if a router's interfaces are up.
- check_ilom
- Check a Sun ilom service processor.
- check_ipf
- Check Sun ipfilter status.
- check_iptables
- Check Linux iptables status.
- check_juniper_firmware
- Check a Juniper router or switch's firmware, including backup boot.
- check_juniper_lacp
- Check LACP protocol and LAG links on a Juniper router or switch.
- check_juniper_optics
- Check sfp/sfp+/xfp optics a Juniper router or switch, including low RX optical power.
- check_license_file
- Check flexlm license file.
- check_liebert_crac
- Check a Liebert computer room air conditioner.
- check_liebert_ups
- Check a Liebert UPS.
- check_lmstat
- Run lmstat and check output for errors.
- check_log_rate
- Check how fast a log file is growing.
- check_maestro
- Check TWS/Maestro. Uses check_conman.
- check_memory_error_rates
- Check Solaris memory error rates, from dmesg.
- check_metastat
- Check Sun Solaris metastat output for software raid errors.
- check_mounts
- Check all filesystems are properly mounted. Sometimes, when Linux
sees enough disk errors, it will switch a mount to read-only.
- check_mqueues.sh
- Check a bunch of mail queues.
- check_nagios_latency
- Check Nagios server's latency.
- check_netapp
- Check a NetApp filer.
- check_netdev
- Check a few things on a generic snmp network device.
- check_nf_conntrack
- Check Linux number of iptables connection tracking state entries.
- check_nrpe_via
- Check an NRPE command on a remote host, via an intermediate host.
This lets you "jump" an nrpe query through a "proxy" host, that you
normally can't connect with.
- check_ntp
- Check NTP server.
- check_orca
- Check orca rrds are updating properly.
- check_powerdsine
- Check a PowerDsine midspan power over ethernet injector models 6000 and 6500. Note, 6000s often get confused and stop responding to snmp.
- check_proc_pgrep
- Check a process using pgrep.
- check_prtdiag
- Check Sun Solaris prtdiag -v output for hardware faults. prtdiag's
output was intended for use by humans, and has no consistant format.
It's different on almost every combination of platform, os revision,
and patch level. So this is a big, ugly pile of regexps.
- check_ps_multi.pl
- Check on a bunch of processes. Uses a remote config file
- check_sakai_login
- Log into sakai (java based course management software), check the
page, log back out, complain if it takes too long.
- check_sonet_errors
- Check a Cisco/Cerent 15454 for sonet errors, via snmp.
- check_stp
- Check on a switch's spanning tree.
- check_switch
- Check a bunch of things about a network switch, or router.
- check_sysUpTime
- Check snmp sysUpTime, and complain if the device rebooted.
- check_systemd
- Check if Linux systemd has any errors.
- check_temp_snmp
- Check Cisco router and switch temperature sensors.
- check_tftp
- Check a TFTP server.
- check_x24
- Check an Xytratex x24 disk array controller.
- check_xscf
- Check an m4000/m5000 xscf service processor.
- check_yum
- Run yum check-update to look for available Linux updates.
- check_zpool
- Check ZFS disk pool.
- count_log_lines.c
- Count the lines added to a log file in the last 5 minutes. This works
by seeking back about 1MB from the end of the file, and parsing the
dates in the log file lines. When you've got a multi-gigabyte log
file, this is much faster than wc.
- downtime_matching
- Schedule downtime on matching hosts and services, cli version.
- downtime_matching.cgi
- Schedule downtime on matching hosts and services, web cgi version.
- gen_flat_status_page
- Generate a single, flat, simple, html page with all current nagios
status info. Having multiple workstations display this is much easier on
the server than having them all use the status cgi or thruk.
- immediately_check_matching
- Schedule an immediate check of matching hosts and services.
- immediately_check_service_on_all_hosts
- Send commands to nagios to immediately check a specified service on
all hosts where it's defined.
- log_ctime
- Handy utility to convert the nagios log timestamps from seconds since epoch to something more human readable. Reads from stdin, writes to stdout.
grep myhost var/nagios.log | log_ctime
- notify_by_epager
- Rate limit pager notifications, so a power outage doesn't send 400 pages to the oncall guy.
- passive_checks
- Some checks take too long to run from inside nagios. This runs a
configurable list of plugins, and feeds the result to nagios as passive
checks via command pipe or NSCA.
Uses a config file
- rm_matching_comments
- Bulk remove comments that match criteria, cli version.
- rm_matching_comments.cgi
- Bulk remove comments that match criteria, web cgi version.
- show_service_output
- Show matching hosts/services with plugin output.