vRealize Operations Manager (vROps) Health-Check for versions prior to 6.5.

This post is not intended to be the traditional front-end health check on the appliance, and instead will focus on the back-end, specifically the Cassandra database on the data nodes.  I decided to write this post due to the various issues I have encountered managing two large production deployments, with the largest containing 9 data nodes, and 3 remote collectors collecting and processing metrics north of 3,829,804.

The first check we can do is on the database sync between the data nodes including the master and master replica.  This can also be useful in determining unusual disk growth on one or more of the data nodes. Open a SSH session to the master appliance and issue the following command:

# $VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo


The sample output to be concerned with looks similar to the following example:

   "stateMappings": {
    "vrops1": {
      "vRealize Ops Shard-0724c812-9def-4391-9efa-2395d701d43e": {
        "state": "SYNCHING"
      "vRealize Ops Shard-77839361-986c-4817-bbb3-e7f4f1827921": {
        "state": "SYNCHING"
      "vRealize Ops Shard-8469fdff-55f0-49f7-a0e7-18cd6cc288c0": {
        "state": "RUNNING"
      "vRealize Ops Shard-8c8d1ce4-36a5-4f23-b77d-29b839156383": {
        "state": "RUNNING"
      "vRealize Ops Shard-ab79572e-6372-48d2-990d-d21c884b46fb": {
        "state": "RUNNING"
      "vRealize Ops Shard-bfa03b9e-bac9-4040-b1a8-1fd8c2797a6a": {

The “vRealize Ops Shard” refers to the data nodes including the master and master replica nodes. The available status’ are RUNNING, SYNCHING, BALANCING, OUT_OF_BALANCE, and OUT_OF_SYNC.

  • States of RUNNING, SYNCHING and BALANCING are normal healthy states.
  • OUT_OF_BALANCE and OUT_OF_SYNC status is cause for concern, and is enough to open an SR with VMware to have them take a look.  But lets look a little deeper to see if there’s more going on here.  It may be beneficial information to give to VMware’s GSS.

The vRealize Operations Manager appliance uses Apache Cassandra database, so with this next command, we will be looking at the database load using a Cassandra utility called node tool. This command is only gathering operational statistics from the database, so it is safe to run as we are not making any system changes here.

  • A good time to use this utility is when you start getting alerts from various datanodes stating high load, or that cassandra DB service has crashed and has now recovered, or that the data node has disconnected and reconnected.
  • I’ve also noticed this to be a cause for failed upgrades, or failed vrops cluster expansions.
# $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool --port 9008 status

This will return output similar to:

Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--   Address    Load    Tokens  Owns  Host ID                               Rack
UN  80.08 GB  256     ?   232f646b-3fbc-4388-8962-34000c1bb48b  rack1
UN  80.53 GB  256     ?   1bfec59b-3ba8-4ca0-942f-5bb2f97b7319  rack1
UN  80.11 GB  256     ?   da6c672c-cc69-4284-a8f5-2775441bb113  rack1
UN  79.33 GB  256     ?   ee4a3c3f-3f0f-46ac-b331-a816e8eb37c5  rack1
DN  75.13 GB  256     ?   19e80237-6f2c-4ff6-881e-ce94870e0ca5  rack1

Note: Non-system keyspaces don't have the same replication settings, 
effective ownership information is meaningless


  • The output to be concerned with here, is the load column.  Under ideal operational conditions, I have been told this should be under 5GB of load per node.  This command does not return data on the remote collectors because they do not contain a database.
  • If database load is over 5GB, you will need to open an SR VMware GSS with this information, along with sending them the usual appliance log bundle.  In this example, my data nodes had over 70 GB of load.
  • If nodetool returns with an error: nodetool: Failed to connect to ‘’ – ConnectException: ‘Connection refused’, checkout this KB2144358 article.  You may be able to get that node back online before calling GSS.

Concerning the database load, In most cases from my experience GSS would need to truncate the activity, results, and queueid tables, and then run a parallel nodetool repair command on all data nodes starting with the master in order to get the appliance’s feet back under it.  I will detail those steps here as these are the steps usually performed:

  1. Take a snapshot of the nodes: master, master replica, data nodes (Remote Collectors can be skipped) to ensure no issues arise.
  2. Leave the cluster ONLINE
  3. Take analytics offline on the master and all data nodes:
  • Perform this step in parallel. That is, execute this command on each node one-after-another without waiting for it to complete in a single terminal firstA simple for-loop calling ssh to the nodes to do this isn’t sufficient. It is best to ensure the master and data nodes are all going offline together.
# service vmware-vcops stop analytics

4.  Repair the RESORUCE_STATE_DELETE flags for non-existing resources that are to be deleted:

  • On the master node only execute:
# su - postgres -c "/opt/vmware/vpostgres/current/bin/psql -d vcopsdb -p 5433 -c \"update resource set resource_flag='RESOURCE_STATE_NORMAL' where resource_flag='RESOURCE_STATE_DELETE';\""

5.  Perform Cassandra maintenance on the master node only.  Afterword you will be running cassandra repair on the rest of the nodes that will sync up their databases with the master.  There are a total of four commands here, so run them in order:

# $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/cqlsh --ssl --cqlshrc $VCOPS_BASE/user/conf/cassandra/cqlshrc -e "consistency quorum; truncate globalpersistence.activity_tbl"
# $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/cqlsh --ssl --cqlshrc $VCOPS_BASE/user/conf/cassandra/cqlshrc -e "consistency quorum; truncate globalpersistence.activityresults_tbl"
# $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/cqlsh --ssl --cqlshrc $VCOPS_BASE/user/conf/cassandra/cqlshrc -e "consistency quorum; truncate globalpersistence.queueid_tbl"
# $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool -p 9008 clearsnapshot 

6.  Perform Cassandra maintenance on all nodes:

  • This is critical. The -pr option to the repair tool causes a subset of the token range to be coordinated from the node calling the nodetool. This improves performance however it is critical that ALL nodes in the Cassandra cluster perform this operation to get a complete and consistent repair.  See <www.datastax.com_dev_blog_repair-cassandra >
  • Execute this on the master and all data nodes simultaneously:
# $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool -p 9008 repair -par -pr

7. To monitor the repair progress, you can start another SSH session to the master node and tail the following log:

# tail -f /storage/log/vcops/log/cassandra/system.log
  • The repair operation has two distinct phases. First it calculates the differences between the nodes (repair work to be done), and then it acts on those differences by streaming data to the appropriate nodes.

Generally speaking, you can also monitor the nodetool repair operation with these two nodetool commands, but this is not necessary:

  • netstats: This monitors the repair streams to the nodes:
# $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool --port 9008 netstats
  • compactionstats: This checks on the active Merkle Tree calculations:
# $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool --port 9008 compactionstats

8. Perform the instance metric key id clean-up on all nodes.  Perform this step in parallel on the master and data nodes.  This cleans up the disk on the nodes, as this cleans up the snapshots of Cassandra on each node:

# $VMWARE_PYTHON_BIN $VCOPS_BASE/tools/persistenceTool.py RemoveMatchedMetricKeys --outputFile /tmp/deleted_report.txt --regex "\"^diskspace-waste:.+?snapshot:.*\"" --remove true # $VMWARE_PYTHON_BIN $VCOPS_BASE/tools/persistenceTool.py RemoveMatchedMetricKeys --outputFile /tmp/deleted_report2.txt --regex "\"^diskspace:.+?snapshot:.*(accessTime|used)$\"" --remove true

9.  Clean up the alarms & alerts on all nodes.   Perform this step in parallel on the master and data nodes:

# su - postgres -c "/opt/vmware/vpostgres/9.3/bin/psql -p 5432 -U vcops -d vcopsdb -c 'truncate table alert cascade;'"
# su - postgres -c "/opt/vmware/vpostgres/9.3/bin/psql -p 5432 -U vcops -d vcopsdb -c 'truncate table alarm cascade;'"

10.  Bring the analytics processes back online.   Execute this step on the master, master replica and data nodes. You may use a ssh for-loop and execute these commands sequentially:

# service vmware-vcops start analytics
  • I have seen the cluster to take 20 to 30 minutes to come back online (from my experience with a 9+ node large cluster).
  • If you log into the https://<vrops>/admin page, you will see that the HA status is degraded, or needs to be re-enabled.  Give the appliance time as it will reset itself back to a healthy green state, once fully online.

11. Once the cluster is fully online and you can confirm the data is being collected, delete the snapshots you took earlier.

12. On the master node, if you again run the command:

# $VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo

You should see something similar to:

   "stateMappings": {
    "vrops1": {
      "vRealize Ops Shard-0724c812-9def-4391-9efa-2395d701d43e": {
        "state": "SYNCHING"
      "vRealize Ops Shard-77839361-986c-4817-bbb3-e7f4f1827921": {
        "state": "SYNCHING"
      "vRealize Ops Shard-8469fdff-55f0-49f7-a0e7-18cd6cc288c0": {
        "state": "RUNNING"
      "vRealize Ops Shard-8c8d1ce4-36a5-4f23-b77d-29b839156383": {
        "state": "SYNCHING"
      "vRealize Ops Shard-ab79572e-6372-48d2-990d-d21c884b46fb": {
        "state": "SYNCHING"
      "vRealize Ops Shard-bfa03b9e-bac9-4040-b1a8-1fd8c2797a6a": {
        "state": "SYNCHING"

13. On the master node, if you again run the nodetool status command

# $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool --port 9008 status

You should see something similar to:

Datacenter: datacenter1
|/ State=Normal/Leaving/Joining/Moving
--   Address    Load    Tokens  Owns  Host ID                               Rack
UN  120.20 MB  256     ?   232f646b-3fbc-4388-8962-34000c1bb48b  rack1
UN  128.20 MB  256     ?   1bfec59b-3ba8-4ca0-942f-5bb2f97b7319  rack1
UN  120.11 MB  256     ?   da6c672c-cc69-4284-a8f5-2775441bb113  rack1
UN  115.33 MB  256     ?   ee4a3c3f-3f0f-46ac-b331-a816e8eb37c5  rack1
DN  128.13 MB  256     ?   19e80237-6f2c-4ff6-881e-ce94870e0ca5  rack1

Note: Non-system keyspaces don't have the same replication settings, 
effective ownership information is meaningless
  • The ideal situation here is that now the Cassandra DB load should be around 1GB or less

14.   Log into the regular web interface and edit the policy to stop collections on snapshot metrics. This will help in overall performance going forward.


Creating vROps Policies and How To Apply Them To Object Groups.

Creating policies in VMware’s vRealize Operations Appliance can be strait forward, if there is a decent understanding of every platform it’s monitoring.  In my last post of this series, I covered the creation of object groups, and that post is important here because policies can be created and assigned to those object groups, allowing the tuning of alerts received for those groups.

Once logged in to the vROps appliance, go into the administration section, and there you will find the policies.


  • VMware has included many base policies in the policy library, which in most cases will be fine for the initial configuration for the appliance, but you may want to create additional policies to suite your specific environment needs.
  • Also take note of the blue film strip in the upper right corner.  This will take you to VMware’s video repository of policies explanation and a brief how-to video.  These video links can be found throughout the configuration of the appliance, and more are added with each release.

To create a new policy click on the green plus sign to get started.  Give the policy a unique name, and it would be good practice to give a description of what the policy is intended to do.  When creating a policy, you have the ability to “start with” a VMware pre-defined policy, and I recommend taking advantage of that until there is a firm understanding of what these policies do.


On the Select Base Policy tab, you can use the drop down menu on the left to get a policy overview of what is being monitored.  In this example, Host system was selected.


Policy Overrides can also be incorporated into this policy.  In other words, if there are certain alerts that you do not want, one of the pre-defined policies may already have those alerts turned off, so those policies can be added to the new policy being created here.  Work smarter, not harder right?


Moving along to the Analysis Settings tab, here you can see how vROps analyses the alerts, determines thresholds, and assigned system badges.  These can be left at their current settings per the policy you are building off of, or you can click on the padlock to the right and make individual changes.  Keep in mind under the “Show changes for” drop down menu, you will have many objects to select to change the analysis settings on.


The Alert/Systems Definitions tab is probably where the majority of time will be spent.  The “Alert Definitions” box at the top is where alerts can be turned on or off based on the base policy used to create this one, or the override policies used.

  • Each management pack installed will have it’s own category for object type.  In other words, “host system” is listed under the vCenter category, but if vCloud Director management pack was installed, it would also have a “host system” under its category.  Each management pack has the ability to add additional alerts for objects referenced in other management packs.  Take time going through each category to see what alerts may need configuring.
  • The State of each alert will either be local with a green check-mark: meaning you enabled it, inherited with a grey check-mark: meaning it is enabled via another policy that was used to create this one, Local with the red crossed out circle: meaning you disabled the alert for the policy, or inherited with a grayed out crossed out circle: meaning it is disabled via another policy that was used to create this one.  Disabling alerts here will still allow the metrics to be collected for the object, you just wont get the alarm for it.
  • The System Definitions section has the same “object type” drop down menu, and you can select the object type here to configure system thresholds for how the symptoms are triggered for the alert selected in the top Alert Definition box above.  I typically do these in tandem.


Finally, you can apply the policy to the custom groups you created before in the Apply Policy to Groups tab.


Once you click save, and go back to the Active Policies tab, you will be able to see the new policy created, and within five minutes, you should see the Affected Objects count rise.  You can see here that I have a policy marked with “D” meaning it is the default appliance policy.  You can set your own policy as default by clicking the blue circle icon with the arrow on the upper left side.  It may take up to 24 hours before the active alert page reflects the settings of the new policy.  Otherwise you can manually clear those alerts.


Previous post to this series: Configuring VMware vRealize Operations Manager Object Groups

Configuring VMware vRealize Operations Manager Object Groups

There are two sections I will cover in this post: ‘Group Types’ and ‘Object groups’.  An example of when you might want to consider creating a group type …lets say you have multiple data centers, a group type could be used as a way to group all objects types of that data center into one folder.  In other worlds: The group type for the data center in Texas would be used as a sorting container for the group objects such as data stores, vCenters, hosts, virtual machines, etc.. and keep them separated from the data center in New York.

The way you can do this is by clicking on the Content icon, selecting group types and then clicking the green plus sign to add a new group type.


Next you can click on the environment icon (blue globe), Environment Overview, click the green plus icon to create a new object group, and then in the group type drop down, you can select the group type you just created.  As far as policy selection goes, the built in VMware policies are a great place to start.  You can easily update this selection later when you create a custom policy.  I would recommend checking ‘Keep group membership up to date’.


Define membership criteria section.  This is where the water can get muddy as you will have more than one way to target your desired environment objects.  In the drop down menu ‘Select the Object Type that matches all of the following criteria’, the object types selection can grow in number depending how many additional adapter Management PAKs are installed on the vROps appliance.  This selection will also be important because of the way vROps alerts off from the management packs.

– An example would be alerts on Host Systems.  One would assume you would select the vCenter Adapter and then select Host Systems, however  if you have the vCloud Director Adapter Management PAK installed for example, that PAK also has metrics for Host Systems that vROps will alert from,  you would also need to select Host systems under that solution to target those alerts and systems as well.


For this example, we will use the Host System under the vCenter Adapter.  There are different ways to target the systems, for this example I will showcase using Object name, contains option.  This option allows us to target several systems IF they have a common name like so:


You also have the option to target systems based on their Relationship.  In this example we have clusters of hosts group under the name of MGMT, so I chose Relationship, Child of, contains, MGMT – to target all systems in that cluster like so:


There is a Preview button in the lower left corner which you can use to see if your desired systems are picked up by the membership criteria you selected.


You can also target multiple system names by using the Add another criteria set option like so:


Depending on the size of the environment, I’ve noticed that it helps to make a selection in the ‘in navigation tree’ drop down as well

When you have the desired systems targeted in the group, click OK.  Groups are subject to the metrics interval collection, so the groups page will show grey question marks next to the groups until the next interval has passed.  Any policies that were applied to these custom groups will not change the alerts page until that metrics collection has occurred.

The added benefit to having custom groups is that vROps will also show you the group health, and if you click on the group name you will be brought to a custom interface with alerts only for that group of targeted systems.


In my next post I will go over the creating of policies and how to apply them to object groups.


Next Post: Creating vROps Policies and How To Apply Them To Object Groups.

Previous Post: Add The vROps License, Configuring LDAP, and The SMTP For vRealize Operations Manager (vROps)


Add The vROps License, Configuring LDAP, and The SMTP For vRealize Operations Manager (vROps)

If you’ve been following my previous posts, I discussed what vRealize Operations Manager is, how to get the appliance deployed, how to get the master replica, data nodes and remote collectors attached to the cluster, and finally how to get data collection started.

Now it’s time to license vRealize Operations Manager.  This can be achieved by logging into the appliance via: < https ://vrops-appliance/ui/login.action >.  Next go into the Administration section, and there you’ll see Licensing.  Click the green plus sign to add the vROps license.


About seven down from Licensing on the left hand column, you will see Authentication Sources.  This is where you configure LDAP.


Again click the green plus sign to configure the LDAP source.


Once the credentials have been added, test the connection and then if everything checks out click OK.

Lastly lets get the SMTP service configured,  about three down from Authentication sources you’ll find outbound settings.  Click the green plus to add a new smtp.


Once you have the SMTP information added, test the connection, and if everything checks out click save.


So now you should have a functioning licensed vrops instance.  In future posts I will cover creating object groups, policies, and configuring some alert emails.

Next Post: Configuring VMware vRealize Operations Manager Object Groups

Last Post: Configuring VMware vRealize Operations Manager Adapters For Data Collection

Configuring VMware vRealize Operations Manager Adapters For Data Collection

If you’ve followed my recent blog post on  Installing vRealize Operations Manager (vROps) Appliance, you are now ready to configure the built in vSphere adapter to start data collection.

Depending on how big your environment is, and IF you have remote collectors deployed, you may want to consider configuring collector groups.  A Collector group allows you to group multiple remote collectors within the same data center, and the idea is that this would allow for resiliency for those remote collectors, that way when you have the vROps adapters pointed to the collector group instead of the individual remote collector, if one of the remote collectors went down the other would essentially pick up the slack and continue collecting from that data center, so there would be no data loss.  You can also create a collector group for a single remote collector for ease of expansion later if you want to add that data collection resiliency.

Go ahead and get logged into the appliance using the regular UI <https//vrops-appliance/ui/login.action>.  From here click Administration.  If you just need to configure the vSphere adapter for data collection, you can skip ahead to Section 2.  Otherwise lets continue in section 1, and configure the collector groups.

Section 1

Click on Collector Groups


You can see that I already have collector groups created for my remote data centers, but if you were to create new, just click the green plus sign


Give the collector group a name, and then in the lower window select the corresponding remote collector.  Then rinse-wash-and-repeat until you have the collector groups configured.  Click Save when finished.  Now lets move on to Section 2.

Section 2

From the Administration area, click on Solutions


Now because this is your new deployment, you would only have Operating Systems / Remote Service Monitoring and VMware vSphere.  For the purpose on this post I will only cover configuring the VMware vSphere adapter.  Click it to select it, and then click the gears to the right of the green plus sign.


Here just fill out the display name, the vCenter Server it will be connecting to, the credentials, and if you click the drop down arrow next to Advanced Settings, you will see the Collectors/Groups drop down menu.  Expand that if you have created the custom collectors in Section 1, and select the desired group.  Otherwise vROps will use the Default collector group, which is fine if you only have one data center,  otherwise I recommend at least selecting a remote collector here if you do not have a collector group configured.  This basically puts the load onto the remote collectors for data collection, and allows the cluster to focus on processing all of that lovely data.  Click Test Connection to verify connectivity, and then click save. Then rinse-wash-and-repeat until you have all vCenters collecting.  Close when finished.

Important to note that vROps by default will collect data every five minutes, and currently that is the lowest setting possible. You can monitor the status of your solutions or adapters here.  Once they start collecting their statuses will change to green.


If you’d like to add additional solutions otherwise known as “Management PAKs”, head on over to VMware’s Solution Exchange .  I currently work for a cloud provider running NSX, so I also have the NSX and vCloud Director Management PAKs installed.  From the same solutions page, instead of clicking on the gears, click the green plus sign and add the additional solutions to your environment.  This would also be used when you are updating solutions to newer versions.  Currently there is no system to update you when a newer version is available.


Go to Global Settings on the Administration page, where you can configure the object history, or data retention policy, along with a few other settings.


Finally, Go back to home by clicking the house icon.  By now the Health Risk and Efficiency badges should all be colored.  Ideally green, but your results may vary.  This is the final indication that vROps is collecting.



Next Post: Add The vROps License, Configuring LDAP, and The SMTP For vRealize Operations Manager (vROps)

Recent Post: Sizing and Installing The VMware vRealize Operations (vROps) Appliance

Sizing and Installing The VMware vRealize Operations (vROps) Appliance

VMware has a sizing guide that will aid you in determining how many appliances you need to deploy.  If you have multiple data centers, and somewhere north of 200 hosts, and more than 5,000 VMs, I’d recommend at least starting out with two servers configured as Large deployments.  Once you get the built in vSphere adapter collecting for each environment, you can run an audit on the environment using vROps to get the raw numbers, and expand the cluster accordingly.  Come prepared.  Walk through your environments and get a list of how many hosts, data stores, vCenters, and get a rough count of the virtual machines deployed.

KB2093783 has more details on the sizing, and I strongly urge you to visit the KB, as there are links to the latest releases of vROps, and each KB has a sizing guide attachment at the bottom, where you can input the information you collected from your environment to get a more accurate size.


Appliance Manual Installation


Architectural Note

  • Before proceeding be sure you have:
    • The appropriate host resources
    • The appropriate storage
    • IP addresses assigned and entered into DNS
    • a “read-only” account configured in AD and vCenter
    • The appropriate ports opened between data centers listed in VMWare’s documentation


Once you have the latest edition of the vROps appliance ovf downloaded, and after consulting the documentation, use either the vSphere client or web, and deploy the OVF template.  I’ll skip through browsing for, verifying the details of, accepting the licence agreement for, and naming the appliance.

So now you’ve come to the OVF deployment step where you must select the size of your appliance.  No matter the size, the remainder of the deployment is the same, but for this example I will deploy an appliance as Large.

You can deploy the appliance in several sizing configurations depending on the size of your environment and those are: Extra Small, Small, Medium and Large.

  • Extra Small = 2 vCPUs and 8GBs of memory
  • Small = 4 vCPUs and 16GBs of memory
  • Large= 16 vCPUs and 48GBs of memory

You can also choose to deploy a remote collector and they come in two sizes:

  • Standard = 2 vCPUs and 4GBs of memory
  • Large = 4 vCPUs and 16GBs of memory


You will notice that with each selection, VMware has given a definition of what it entails. Choose the one that best suits your needs. Click next

Storage dialog

  • Depending on the size of your environment, vrops VMs can get to over a terabyte in size each
  • Once you’ve made your selection click next
  • Architectural Note – If adding a master replica node to your vROps cluster, I’d recommend keeping the Master and Master Replicas on separate XIVs, or whatever you use to serve up storage to your environment.

Disk Format dialog

  • The default is Lazy Zeroed, and that’s how my environments have been deployed.  I’d strongly advise not using thin provision for this appliance.
  • Once you’ve made your selection click next.

Network Mapping dialog

  • Select the appropriate destination network like a management network, where it can capture traffic from your hosts, VMs, vCenters and datastores.
  • Once you’ve made your selection click next.

Properties dialog

  • Here you can set the Timezone for the appliance, and choose whether to use IPv6
  • Once you’ve filled out the network information, click next

Configuration Verification dialog

  • Read it carefully to be sure there were no fat fingers at play.  Click finish when ready.


Before you proceed in turning on the appliance, you may want to take the opportunity now and expand its disk.  This can be done a couple of ways.  You can expand the existing Hard Disk 2, however keep in mind that the current file system can only see disks under 2TB.  Any disk space allocated over 2TB the appliance wont be able to see.  For my production environment, I increased disk 2 to 1TB in size, and then added 500GB disks as more storage was needed.  Also keep in mind the amount of data you are going to be retaining.  My appliances are configured for 6 months, but this can be changed as needs change.  We’ll go over this later in another post. The cool thing about this appliance is that as you increase the size of disk 2, or add additional storage, the appliance during the power-on process, expands the data partition automatically.

Power up the appliance, open a console to it in vCenter to watch it boot up, and go through some scripted configurations.


  • To get logged in, press ALT + F1 keys.  Enter root for the user, leave the password blank and hit enter.  Now you will be prompted to input the current password, so leave it blank and hit enter.  Now enter a new password, hit enter and enter the new password once more for verification.
  • Now depending on how locked down your environment is, you may not be able to but I always ping out to along with hitting a few internal servers to verify network settings.
  • Also unless you really enjoy VMware’s console, I’d recommend running a couple commands to turn on SSH, so any future administrative tasks can be performed with a putty session.
    • The first command is:  # chkconfig sshd on
      • This enables the sshd service at system boot
    • The second command is: # service sshd start
      • This turns on the sshd service so you can connect to the box with a putty session.


Using Microsoft Edge, Firefox or Chrome, browse to < https ://vrops-appliance-name/ >.  This will redirect you to the Getting started page where you can choose Three options:

Express Installation, where you can set the admin password and that’s pretty much it.


New Installation gives you a few more options to configure, like which NTP server(s) you want to use, and a TLS/SSL certificate you’ve created specifically for this system (or just use the built-in one).


Expand An Existing Installation – this option would be used for additional data nodes or remote collectors as you’ll have the option to pick under “node type”.


For this installation we will select New Installation.  As a rule of thumb and for better appliance performance, I’d use the NTP servers on your network that vCenter and the ESXi hosts are using to keep time in check. Once you’ve made it though the wizard click finish.


It shouldn’t take too long for the master appliance to setup and take you to a log in screen.

You’re not done yet however. You still have to configure your cluster if you have additional data nodes, and remote collectors to add.  If you have a master replica, data nodes, or remote collector, get them connected to the master.  Each will have their own web UI  < https ://vrops-appliance-name/ >, only this time you can use the Expand An Existing Installation Option. You will also need to log into the admin section for some of this <https ://vrops-appliance-name/admin/login.action>

Lets get the master replica added first.  When you use the expand an existing cluster option, you’ll need to add it as a data node.  Then wait for the cluster to expand to it.


Then click the finish adding new nodes button.


To enable HA, you’ll notice in the center of the screen there is a High Availability option, but it is disabled.  Go ahead an click enable


Now select the data node that will be the master replica, make sure enable high availability is checked, and click OK.  This part will take a little while, and the cluster services will be restarted.  After word the High Availability status will be enabled.


Add any remaining data nodes and remote collectors using the Expand An Existing Installation Option.


Architectural Note

  • I’d recommend going into vCenter and adding an anti-affinity rule to keep the master and master replica on separate hosts
  • If you’ve deployed vROps to its own host cluster, I’d recommend turning down vSphere DRS to conservative.  The appliances are usually pretty busy in an active environment, and having one vmotion on you can cause cluster performance degradation, and will throw some interesting alarms within vROps.  It will recover on its own, but better to avoid when possible.


Next up – You”ll need to configure the built in vSphere adapter so you can start collecting data.  I’ll have more on that in my next post.

Next Post: Configuring VMware vRealize Operations Manager Adapters For Data Collection

Recent Post: What Is VMware’s vRealize Operations Manager?

What Is VMware’s vRealize Operations Manager?

Formally Known as vCenter Operations (vCOps), vRealize Operations Manager (vROps) really has become the center of the vSphere universe.  vROps is an appliance that sits in your environment collecting system metrics from vCenter, virtual servers and ESXi. It acts as the single pane of glass to the virtual environment, allowing the administrator to track and mitigate resource contention, along with performance and capacity constraints.  vROps will also “learn” about the environment, and given a couple of months, the data collection can be used to perform future calculations to determine things like when more capacity is required based on growth and resource consumption, and that’s pretty cool.  Data collection is not just limited to VMware products however, you can also install additional management PAKs from VMware’s Solutions Exchange, and there are always more being added.  Which brings us to the next topic: Sizing your vROps deployment.

Unlike vCOps which consisted of two virtual machines within a vApp, vROps is a single virtual appliance, that can be expanded and clustered for additional compute resources.  A single appliance or node can be deployed and looks like the following in figure 1:

Figure 1 – A single vROps node


Now the cool thing about vROps is that it has the built in functionality of clustering two appliances together as a Master and Master Replica, giving you resiliency in case of a failure.  You can also add additional nodes to the cluster known as Data nodes, that will allow you to collect and process even more metrics.  It should go without saying, but as you add more Management PAKs from VMware’s solution exchange, keep in mind that you may need to add additional data nodes.  It’s also important to note that the master and master replica servers can also be referred to as data nodes, and that is important because since vRealize Operations Manager 6.x, you can have a total of 16 data nodes in a cluster. That means you can have an HA pair, and 14 additional data nodes.  You can deploy the appliance in several sizing configurations depending on the size of your environment and those are: Extra Small, Small, Medium and Large.

  • Extra Small = 2 vCPUs and 8GBs of memory
  • Small = 4 vCPUs and 16GBs of memory
  • Large= 16 vCPUs and 48GBs of memory

You can also deploy additional appliances known as a Remote Collector, and since vRealize Operations Manager 6.1.x, you can have a total of 50 remote collectors, allowing the collection from 120,000 objects and 30,000,000 individual metrics.  Now that’s a lot of data!  Now these remote collectors come in different size configurations as well.

  • Standard = 2 vCPUs and 4GBs of memory
  • Large = 4 vCPUs and 16GBs of memory

In figure 2, this is what a clustered installation would look like and where remote collectors fit in.

Figure 2 – A vROps cluster


As you can see, the main cluster or the master, replica and data nodes all share a database and analytics processing engine, but the remote collector does not.  It’s goal is simply to act as a vacuum, to collect and push the metrics collected from those remote data centers back to the main cluster for processing and storage.

All together this makes for a fantastic resource for troubleshooting and metrics data retention for historical data.  I will caution that the vROps appliance requires a lot of CPU and memory depending on your environments configuration, and you should be sure to have ample resources supporting it.  To get the most from this appliance, I’d also recommend at least one dedicated engineer to vROps, as there is a great deal of information to be had, and much to configure and maintain.

A Final Word

As someone who has been responsible for several large deployments, I can tell you this appliance has come a long way from its former days as VMware vCenter Operations Manager, and the developers dedicated to this platform are hard at work making it even better as it becomes the center of the software defined datacenter universe, within the VMware stack.

There are excellent blogs over at VMware that dives deeper into this appliance and it’s capabilities.  For more information visit their site via blogs.vmware.com