Alerts Cycle in vROPs

vRealize Operations​​ can show us​​ a lot of​​ alarms​​ from our environment. It’s​​ pretty​​ common to open​​ it​​ and find​​ +10000 of generated​​ alarms. I thought it would be a good idea to explain here how they are generated, canceled​​ and deleted from the environment and what parameters​​ can control​​ those​​ decisions. Adjusting these parameters​​ can greatly​​ help keep the vROPs​​ focused on the real problems of the environment and avoid false positives!

vROPs Collection Interval

First of all, we have to understand what​​ is​​ a collection cycle​​ in​​ vROPs. VROPs has a collection cycle of 5 minutes​​ by default.​​ It means that at​​ every 5 minutes it collects information from vCenter.​​ It​​ is important to recall​​ that this collection (the point on the chart) is an average of 15 samples of vCenter 20s. Sunny Dua has a post that explains this​​ math​​ perfectly! Check it out​​ HERE.

The default value is suitable for most environments. Shrinking it will consume more storage and​​ 
CPU​​ to process the additional data. If you increase it, it will consume less storage and​​ CPU. In doubt, do not change! You can confirm this setting in the path shown in the Figures below.

Alarms and Symtoms

An alarm in vROPs is defined by one or more symptoms. For the alarm to be true all the conditions imposed by the symptoms must be true. Let's use the "Virtual Machine CPU Usage is 100% for an extended period of time" alarm in this article to understand its behavior. This alarm​​ 
has only the "Virtual Machine sustained CPU Usage is 100%"​​ symptom shown​​ in the image​​ below.



To understand the alarm, we have to see what makes the symptom true. To see this information simply follow the path shown in the​​ image​​ below.

Click the pencil icon to open the symptom settings.​​ It will open the screen shown in the image below. In​​ arrow number​​ 1 we can see which metric is being checked by the symptom. In arrow number​​ 2​​ we can see​​ which threshold is being used to cause the symptom.​​ This symptom is verifying​​ if​​ the metric CPU | Usage (%) is equal to or greater than 100%.​​ But being equal to or greater than 100% still does not make the symptom true!

Wait Cycle e Cancel Cycle

VROP alarms and symptoms have two settings: Wait Cycle and Cancel Cycle. At​​ the​​ alert​​ level​​ these settings can be checked in the path shown in the​​ image​​ below.​​ At​​ the symptom​​ level, you can check​​ in the path shown in the​​ image​​ below indicated by​​ the square​​ number 3.


Wait Cycle tells you the number of cycles in which the symptom should find the condition​​ to be true. In our example, the symptom is true when the CPU | Usage (%) is equal to or greater than 100% for 6 cycles. As each cycle is in a 5-minute interval, we can say that the virtual machine has to​​ have​​ 100% CPU for 30 minutes for the symptom to be true.

Cancel Cycle is the opposite. It will inform the number of cycles that the symptom has to be false so that the symptom is canceled. In this case the CPU Usage metric should be less than 100% for 6 cycles so that the symptom is false.

With Wait Cycle and Cancel Cycle you can customize how responsive the analysis of vRealize Operations will be. If you wanted a more sensitive alarm, simply turn down the Wait Cycle.​​ Do you want a more conservative alarm? Increase the Wait Cycle. The same goes for Cancel Cycle.

Remember I said that Alerts also have wait cycle and cancel cycle configuration?​​ 

​​ All alarms have the Wait Cycle setting set to 1 to ensure that the alarm will be activated as soon as all the symptoms that form the alarm are true. In our example, as soon as the symptom is true, the alarm will be activated and will appear in the Alarms tab with the​​ active status.​​ They​​ also have the Cancel Cycle set to 1 to ensure the alarm will be canceled once all symptoms are no longer true.

The best way to control the sensitivity is to configure​​ wait cycle and cancel cycle by​​ the symptom.​​ Leave the Alerts configuration to the default value of one!

Millions of Inactive alarms

We understood how alarms became active and how they are canceled. After the alarm is canceled it will appear with​​ an​​ inactive status as indicated in the​​ image​​ below.

The problem is that you can start to see several alarms in that state appearing on your​​ Alarm​​ tab. VROPs will store alarms and symptoms canceled for 45 days after they are canceled (by Cancel Cycle or manually by a user). If 45 days is too much for your environment, you can change this value in the path shown in​​ image​​ below. In my vROPs I had already changed this retention​​ policy​​ to 2 days (a very low value just for me to test that the inactive alarms​​ were deleted). What​​ value​​ to use will depend on your company's information retention policies​​ 😊​​ 


I hope this article makes it easier for you to understand vROPs alarms. If something​​ is not right​​ or you have any questions don’t​​ be shy! Use the comment session below!