Shutdown and Startup Sequence for a vRealize Operations Manager Cluster

You ever hear the phrase “first one in, last one out”?  That is the methodology you should use when the need arises to shutdown or startup a vRealize Operations Manager (vROps) cluster.  The vROps master should always be the last node to be brought offline in vCenter, and the first node VM to be started in vCenter.

The proper shutdown sequence is as follows:

  • FIRST: The data nodes
  • SECOND: The master replica
  • LAST: The master

The remote collectors can be brought down at any time.  When shutting down the cluster, it is important to “bring the cluster offline”.  Thing of this as a graceful shutdown of all the services in a controlled manor.  You do this from the appliance admin page

1. Log into the admin ui…. https://<vrops-master>/admin/

vrops48

2. Once logged into the admin UI, click the “Take Offline” button at the top.  This will start the graceful shutdown of services running in the cluster.  Depending on the cluster size, this can take some time.

vrops49

3. Once the cluster reads offline, log into the vCenter where the cluster resides and begin shutting down the nodes, starting with the datanodes, master replica, and lastly the master.  The remote collectors can be shutdown at any time.

4. When ready, open a VM console to the master VM and power it on.  Watch the master power up until it reaches the following splash page example.  It may take some time, and SUSE may be running a disk check on the VM.  Don’t touch it if it is, just go get a coffee as this may take an hour to complete.

The proper startup sequence is as follows:

  • FIRST: The master
  • SECOND: The master replica
  • LAST: The data nodes, remote collectors

vrops4

5. Power on the master replica, and again wait for it to fully boot-up to the splash page example above.  Then you can power on all remaining data nodes altogether.

6. Log into the admin ui…. https://<vrops-master>/admin/

7. Once logged in, all the nodes should have a status of offline and in a state of Not running before proceeding.  If there are nodes with a status of not available, the node has not fully booted up.

vrops50

8. Once all nodes are in the preferred state, bring the cluster online through the admin UI.

Alternatively…..

If there was a need to shutdown the cluster from the back-end using the same sequence, but you should always use the Admin UI when possible:

Proper shutdown:

  • FIRST: The data nodes
  • SECOND: The master replica
  • LAST: The master

You would need to perform the following command to bring the slice offline.  Each node is considered to be a slice.  You would do this on each node.

# service vmware-vcops-web stop; service vmware-vcops-watchdog stop; service vmware-vcops stop; service vmware-casa stop
$VMWARE_PYTHON_BIN /usr/lib/vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py --action=bringSliceOffline --offlineReason=troubleshooting

If there was a need to startup the cluster from the back-end using the same sequence, but you should always use the Admin UI when possible:

Proper startup:

  • FIRST: The master
  • SECOND: The master replica
  • LAST: The data nodes, remote collectors

You would need to perform the following command to bring the slice online.  Each node is considered to be a slice.  You would do this on each node.

# $VMWARE_PYTHON_BIN $VCOPS_BASE/../vmware-vcopssuite/utilities/sliceConfiguration/bin/vcopsConfigureRoles.py --action bringSliceOnline
# service vmware-vcops-web start; service vmware-vcops-watchdog start; service vmware-vcops start; service vmware-casa start

If there is a need to check the status of the running services on vROps nodes, the following command can be used.

# service vmware-vcops-web status; service vmware-vcops-watchdog status; service vmware-vcops status; service vmware-casa status