In the IT world, performance and uptime monitoring are like bridge tenders. That is to say, no one pays much attention to them until something goes wrong. It’s an approach that more businesses should rethink. Having a solid monitoring procedure in place can yield significant benefits to intelligence gathering and decision making, as well as optimizing daily operations.
One of the first places to focus on when developing a monitoring system is for server and process uptime. As most of us have experienced firsthand with home computers, the fact that the computer is up and operational is no guarantee that your desired program is running. Identify all the critical applications it takes to deliver your web site and other services. This would include your HTTP server software, DNS, inbound and outbound mail server applications, authentication servers, and anything else particular to your environment. Your process and uptime monitoring scheme should also include IPs for internal and external server interfaces, routers and switches, network attached storage devices, and any other important hardware with an interface reachable with PING or other network monitoring protocols.
It is helpful for administrators to have visibility into the other systems that deliver your web site and network applications, for example, temperature, humidity, and power in the operating environment where your infrastructure is deployed. If you have peering with multiple backbone providers, you’d want to know the health of those network connections. You’ll also want to monitor resources like processor overhead, memory usage, and available storage space. When it comes to monitoring, more is almost always better, and the information you gain from monitoring processes has a hidden value.
(more…)