Wednesday, June 12, 2013

vCenter Operations Manager (vCOPS): What It Is and Why You Should Use It

For years, vSphere has had alarms to let you know when things are above or below thresholds you specify. The problem is that these thresholds are static; you set a value and are notified if it is above that value, such as CPU utilization >75%. While useful, it can lead to many false alarms if you have a virtual machine (VM) that routinely exceeds that value or one that spikes to that value for a while during a batch-processing interval. Now, vCenter Operations Manager (vCOPS) lets the computer do what it does best: monitoring and alerting, figuring out what is "normal", and then notifying administrators when things are abnormal.

For years, vSphere has had alarms to let you know when things are above or below thresholds you specify. This is a great first step in identifying items that may require your attention and/or further investigation. The problem is that these thresholds are static; you set a value and are notified if it is above that value, such as CPU utilization >75%. While useful, it can lead to many false alarms if you have a virtual machine (VM) that routinely exceeds that value or one that spikes to that value for a while during a batch-processing interval. In those cases, that level of CPU utilization is expected and normal, and looking at it again just leads to wasted time, and soon to ignoring alarms as probable false positives. In addition, it fails to alert you if utilization is below normal, such as a service failure causing processing to stop. Knowing when you have a "real" issue is the key. The problem is when there really is an issue, you may ignore it, thinking it is not a real issue. There have been case studies done of companies that have worked on setting threshold for a year to find what the "right" values are for VMs. That is a huge waste of time and money, and it does not work well in a dynamic environment.

Another issue for many deployments is that the VMs may have wasted resources - VMs that are over-provisioned, powered off, or even removed from the inventory, but still on disk. This wastes resources and drives up the Total Cost of Ownership (TCO). The question is how to identify those resources and get them back. Conversely, other machines may not have the required resources to run well - which ones are they and what do they need? How do I know which VMs are consuming a lot of resources and impacting the performance of other VMs?

All these, and many other questions, can be answered by carefully studying performance graphs in the vSphere Client (or the new Web Client) and monitoring alarms. The problem is, these tasks are time-intensive and administrator time is both expensive and at a premium. Enter vCenter Operations Manager (vCOPS). Let the computer do what it does best: monitoring and alerting, figuring out what is "normal", and then notifying administrators when things are abnormal.

vCOPS is a tool from VMware that is designed to analyze your environment, figure out what is "normal", and alert you when abnormalities occur. These abnormalities can be at the VM, host, cluster, or data store levels. vCOPS is designed to help you find both undersized and oversized VMs as well as wasted resources. It can help you spot issues early. It provides root-cause analysis for issues detected. If you also have vCenter Configuration Manager (vCM - part of the vCOPS Management Suite), it can correlate events that occurred in the environment with results in the VM, host, etc. The tool will gather data over time and dynamically set thresholds and report back when they are exceeded. vCOPS is designed to alert you when things are not normal and not bother you about little events that are normal (and probably common) in your environment.

The vCOPS Management Suite includes several other tools that are also useful in analyzing and diagnosing issues and relationships in your environment. The Where Do I Get It? section lists the options available in the suite, as well as details on other products that come with vCOPS, and the What Editions Are Available? section lists the capabilities available in each edition.

Most environments with more than a few servers and a few dozen VMs need vCOPS (or a similar tool). There are just too many things going on and too few administrators to watch all that is happening to effectively manage issues. Also, there are too many false alarms raised for things that may be normal in an environment, such as a server using a lot of CPU doing batch processing overnight. This could be fixed by adjusting alarm values for those VMs, but that requires a lot of data-gathering and analysis to figure out what is "normal" for each VM and then implementing of those custom alarms on all affected VMs, leading to management by exception. The more data-gathering and analysis that take place in an environment, the more complex and costly it is to manage.


View the original article here

No comments:

Post a Comment