Extending Net-SNMP for infrastructure monitoring

Extending Net-SNMP for infrastructure monitoring

How-tos Published on 4 mins Last updated

Monitoring your infrastructure is a crucial task, and (regardless of the software you've chosen) requires you to make use of the Simple Network Management Protocol (SNMP). The Protocol is embedded in multiple local devices such as routers, switches, servers, firewalls, and wireless access points accessible using their IP address — providing a common mechanism for network devices to relay management information within single and multi-vendor LAN or WAN environments.  

So whether you would like to be informed of the state of your Loadbalancer.org appliance, Windows Server, or the Cisco switch, it all can be done with the same Protocol.

Net-SNMP is possibly the most popular and the most advanced implementation of the SNMP protocol. It allows SNMPd enabled devices to be queried; as well as providing the daemon capability itself.

snmpwalk -v 2c -c public localhost

Issuing the above command on the Loadbalancer.org appliance with SNMP enabled will output quite a few lines to your terminal (N.B. if it works, please change your community string — it's highly insecure!).

In fact, the 4551 lines of output available will mean that each of these will be responding to a different property that can be queried. However, despite the vast amount of those properties, not all required scenarios are covered.
Luckily net-SNMP provides us with the ability to extend its functionality in various ways...for instance by writing your own SNMP agent like this open-source HAProxy agent. This requires some level of programming knowledge though. There are also easier ways of expanding the SNMP daemon available as well.

Firstly, we can add the log file of our choice to be read by snmpd. It is very easy to do:

logmatch hb_warning /var/log/ha.log 10 WARN

Adding the line above to your snmd.conf the file will add a new object that can be queried now (snmpd requires a reload to read the new config file in the first place).

Let me explain the particular values found in this new line:

  • logmatch is the config directive in net-snmp.
  • hb_warning is the name of your choice for the created trap.
  • /var/log/ha.log is in my case the path to the heartbeat log file.
  • 10 is the interval of how often the file is being checked, expressed in seconds.
  • WARN is the regular expression to be matched.

If the regular expression is being matched a Success message will be returned, which means you should set up a warning in your monitoring software if such a message is received. Sadly, as you can see, there isn't much flexibility with this solution, and in this scenario, Success doesn't really sound appropriate, although it is simple to implement. Also, it would require a line like that for each value we want to check for, in order to monitor for different events.

Thankfully, by adding just a single line to the config file you can add new functionality to your snmpd.

Another option is to use another feature on Net-SNMP. The Net-SNMP Agent provides an extension MIB (NET-SNMP-EXTEND-MIB) that can be used to query command outputs and arbitrary shell scripts.

This allows for quite a lot of flexibility if an appropriate script is written. A simple example of it is extend hb_log /usr/bin/tail -n 1 /var/log/ha.log

In this case:

  • extend is the config directive providing the functionality.
  • /usr/bin/tail -n 1 is the command that we are executing (it must be a full path).
  • /var/log/ha.log is the argument for the command, which in this case is the log file path.

This will simply send the last line of the heartbeat log to monitoring software, where again you could choose some particular values that trigger warnings etc.
More can be achieved with the use of a shell script, as some messages could be defined for particular keywords appearing in the log file.

As mentioned above, using the same extend directive we can also use shell scripts to provide new functionality. Below is a very easy example that will tell us if both of your Loadbalancer.org nodes in the pair are Active:

#!/bin/sh

cmp /var/log/nodestatus_local /var/log/nodestatus_remote

if [ $? = 1 ]; then
	echo "All is fine"
else
	echo " Both nodes may be active"
fi

both_active.sh

In order to add this functionality to our snmpd, we need to append the below line to our config:

extend /bin/sh /root/both_active.sh

Focusing a lot on Heartbeat in previous examples, I would also like to show you some examples with other programs....

For instance, whenever making any chances to Layer 7 configuration on your Loadbalancer.org appliance, HAProxy needs to be reloaded. Whilst usually it isn't the problem, it may lead to HAProxy processes lingering for longer periods of time.

The script below will let you know how many of the HAProxy processes are currently running:

#!/bin/sh

NUMPIDS=$(pgrep haproxy | wc -l)

exit $NUMPIDS
haproxy_pids.sh

Another idea of such a script, not necessarily for your Loadbalancer.org appliance, may be finding zombie processes:

#!/bin/sh

ZOMBIE=$(ps axo pid=,stat=,command= | awk '$2 ~ /Z/  {print "Zombie process " $3, "with PID " $1 }')
ZOMBIE_COUNT=$(echo $ZOMBIE | grep -o Zombie | wc -l)

 if [ $ZOMBIE_COUNT = 0 ]; then
	 printf %"s\n" "No zombie processes found"
 else
	 printf %"s\n" "$ZOMBIE"
 fi

find_zombies.sh

I hope these ideas may fit some of your scenarios or at least give you some ideas of what else can be implemented.

Need help?

Our experts are always here