Manually starting vRealize Hyperic 5.8.X Appliance

I’ve had this happen to me on the 5.8.4 appliance and thought I would share.  Normally The Hyperic appliance is deployed as a vApp consisting of two VMs, and when the vApp is started/restarted, they each start in the proper order.  This process might be needed if the database doesn’t exit/shutdown normally and thus doesn’t start up right the next time.  And if the database isn’t running, the Hyperic UI server won’t start.

Login to the server with ssh, use the hqamdin password with the root username that you specified during the vRealize Hyperic Appliance deployment, unless you have changed them of course…

First start the Postgresql database: hypericdb.  These services have to be started under the hqadmin account.  

  • To check the status of the service run the following command:
# su -c '/opt/vmware/vpostgres/9.1/bin/pg_ctl status -D /opt/vmware/vpostgres/9.1/data/' - hqadmin
  • To start the service run the following command:
# su -c '/opt/vmware/vpostgres/9.1/bin/pg_ctl start -D /opt/vmware/vpostgres/9.1/data/' - hqadmin

Once the database is running, start the hyperic server: hyperic.  This service has to be started under the hyperic account.

  • You can check the status of the hyperic server service by running the following command:
# su -c '/opt/hyperic/server-5.8.4-EE/bin/./hq-server.sh status' - hyperic
  • You can start the service by running the following command:
# su -c '/opt/hyperic/server-5.8.4-EE/bin/./hq-server.sh start' - hyperic

 

You can follow if the Hyperic server starts properly from the bootstrap log on the xx01-m-hyperic server.

# tail -f /opt/hyperic/server-5.8.4-EE/bin/logs/bootstrap.log

 

Hope this helps anyone out there who still uses vRealize Hyperic

 

 

 

 

 

 

Enable TLS v1 In vCloud Director 8.20 and vCloud Availability 1.0

VMware’s vCloud Director (vCD) and vCloud Availability (vCAV) only come with TLS v1.1 and 1.2 enabled out of the box.  This process will show you how to enable TLS v1.  If more information is needed, please visit VMware’s Documentation on vCloud Director 8.20, or the following KB2145796.  This work should be completed after hours as you would inevitably be moving VCD proxy service from one cell to another, and this could cause a brief outage for customers.  This process will require taking the cell offline, so do each cell one at a time starting with a cell not running the inventory service

  • Open an SSH session to a VCD cell, or vCAv cloud proxy cell, and su to root
  • Change to the ‘ /opt/vmware/vcloud-director/bin/ ‘ directory
  • Use the Cell Management Tool to quiesce the cell.  This will move active jobs over to another cell, and cleanly shutdown the cell.  You should make note which VCD cell has the proxy service enabled, and avoid that cell until last.
# ./cell-management-tool -u administrator cell --quiesce true
  • Get the status of any running jobs on each cell.   ** Verify Job count = 0   |  Is Active = false  | In Maintenance Mode  = false
# ./cell-management-tool -u administrator cell --status

Example Output:

Job count = 0
Is Active = false In Maintenance Mode = false
  • Shut the cell down to prevent any other jobs from becoming active on the cell.
# ./cell-management-tool -u administrator cell --shutdown

Example Output:

Cell successfully deactivated and all tasks cleared in preparation for shutdown Stopping vmware-vcd-watchdog:                              [  OK  ] Stopping vmware-vcd-cell:                                  [  OK  ]
  • Run the following command on the vCD cell in /opt/vmware/vcloud/bin/ to enable TLS1
# ./cell-management-tool ssl-protocols -d SSLv3,SSLv2Hello
  • Start the cell service, and validate that a vCD cell has the listener service running from the UI, and that vCenter is connected to one of the cells.
# service vmware-vcd start
  • To validate that TLS v1 has been enabled on the vCD cell, or vCAV cloud proxy cell, run the following command
# ./cell-management-tool ssl-protocols -l

Example output

Allowed SSL protocols:
* TLSv1.2
* TLSv1.1
* TLSv1
  • If you have additional VCD cells, or vCAV cloud proxy cells, repeat this process one at a time.

 

 

 

 

 

 

 

 

Network Scanners Can Crash vRealize Operations Manager Tomcat Service On Large Clusters

If network scanners are deployed in your production environments, it may be necessary to white-list the vROps nodes, as the network scanners can bring the tomcat service to its’ knees, especially on active vROps clusters.  In my case the network scanner was causing tomcat to crash, so when users would attempt to access the main vROps , they’d get the following error:

Unable to connect to platform services

While troubleshooting this issue, I went through the sizing of the cluster, performance, verifying there’s nothing backing up the vROps VMs, even made sure the datastores and specific hosts were health.  Even tried replacing the “/usr/lib/vmware-vcops/user/plugins/inbound” directory and files on all nodes from the master copy in hopes that it would make the cluster healthy again and stop tomcat from panicking.

The following was discovered after reviewing the /var/log/apache2/access_log on the master:

192.216.33.10 - - [10/Oct/2017:04:56:23 +0000] "GET /recipe/login.php?Password=%22'%3e%3cqqs%20%60%3b!--%3d%26%7b()%7d%3e&Username=&submit=Login HTTP/1.0" 301 362 "-" "-"
192.216.33.10 - - [10/Oct/2017:04:56:23 +0000] "GET /recipe/recipe/login.php?Password=%22'%3e%3cqqs%20%60%3b!--%3d%26%7b()%7d%3e&Username=&submit=Login HTTP/1.0" 301 369 "-" "-"
192.216.33.10 - - [10/Oct/2017:04:56:23 +0000] "GET /recipe/recipe_search.php?searchstring=alert(document.domain) HTTP/1.0" 301 326 "-" "-"
192.216.33.10 - - [10/Oct/2017:04:56:23 +0000] "GET /recipe/recipe/recipe_search.php?searchstring=alert(document.domain) HTTP/1.0" 301 333 "-" "-"
192.216.33.10 - - [12/Oct/2017:08:30:43 +0000] "GET /recipe_view.php?intId=char%2839%29%2b%28SELECT HTTP/1.1" 301 282 "-" "-"
192.216.33.10 - - [12/Oct/2017:08:31:06 +0000] "GET /modules.php?name=Search&type=stories&query=qualys&catebgory=-1%20&categ=%20and%201=2%20UNION%20SELECT%200,0,aid,pwd,0,0,0,0,0,0%20from%20nuke_authors/* HTTP/1.1" 301 410 "-" "-"
192.216.33.10 - - [12/Oct/2017:08:31:06 +0000] "GET /modules.php?name=Top&querylang=%20WHERE%201=2%20ALL%20SELECT%201,pwd,1,1%20FROM%20nuke_authors/* HTTP/1.1" 301 342 "-" "-"
192.216.33.10 - - [12/Oct/2017:08:31:10 +0000] "GET /index.php?option=com_jumi&fileid=-530%27%20UNION%20SELECT%202,concat%280x6a,0x75,0x6d,0x69,0x5f,0x73,0x71,0x6c,0x5f,0x69,0x6e,0x6a,0x65,0x63,0x74,0x69,0x6f,0x6e%29,null,null,null,0,0,1%20--%20%27 HTTP/1.1" 301 445 "-" "-"
192.216.33.10 - - [10/Oct/2017:04:20:19 +0000] "GET /recipe_view.php?intId=char%2839%29%2b%28SELECT HTTP/1.1" 301 282 "-" "-"
192.216.33.10 - - [10/Oct/2017:04:20:42 +0000] "GET /modules.php?name=Search&type=stories&query=qualys&category=-1%20&categ=%20and%201=2%20UNION%20SELECT%200,0,aid,pwd,0,0,0,0,0,0%20from%20nuke_authors/* HTTP/1.1" 301 410 "-" "-"
192.216.33.10 - - [10/Oct/2017:04:22:32 +0000] "GET /third_party/fckeditor/editor/_source/classes/fckstyle.js HTTP/1.1" 301 284 "-" "-"
192.216.33.10 - - [10/Oct/2017:04:22:32 +0000] "GET /third_party/tinymce/jscripts/tiny_mce/plugins/advlink/readme.txt HTTP/1.1" 301 292 "-" "-"
192.216.33.10 - - [10/Oct/2017:04:22:32 +0000] "GET /rsc/smilies/graysmile.gif HTTP/1.1" 301 253 "-" "-"
192.216.33.10 - - [10/Oct/2017:04:22:32 +0000] "GET /media/users/admin/faceyourmanga_admin_girl.png HTTP/1.1" 301 274 "-" "-"

 

Tomcat service is being pushed to the limits and using many more resources than planned. There is upwards of 10,000 requests in bursts from a single IP address.  From the logs it certainly looks like an attack, but that’s coming from an internal IP address.

My advice – get your security team to white-list your vROps appliances.

To restart the web service on all vROps nodes either by issuing this command to each node: ‘service vmware-vcops-web restart’ , or log into the admin page, take the cluster offline and then back online.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Install Hyperic Agent 5.8.x On SUSE 11 and SUSE 12 Based VMware Appliances

Let me start out by saying that if you’d like to install the Hyperic agent, a VMware platform (vRealize Hyperic) that is nearing the end of its’ life (late 2018), you should first **make sure having the agent installed on VMwares’ SUSE based appliance is supported.**

vRealize Hyperic is a terrific platform, that unfortunately has reached the end of its product development life cycle, and will ultimately reach the end of support late 2018.

With that said…

In this particular case I wanted to monitor the SUSE appliance virtual machines of VMware’s vCloud Availability, and since I already am using Hyperic to monitor our production environment management virtual machines…

  • To start the installation run:
# zypper install vcenter-hyperic-agent-5.8.4.EE-1.noarch.rpm

example output:

hyperic

  • Respond with:     a

example output:

hyperic2

  • Respond with:      y

example output:

hyperic3

UPDATE SYSTEM FIREWALL TO ALLOW TCP PORT 7080

  • Edit /etc/sysconfig/SuSEfirewall2 and update lines 281 and 379 with the addition of port 2144 for SUSE 11, or lines 253 and 351 with the addition of port 2144 for SUSE 12
  • Note: For listing multiple ports SuSEfirewall 2 uses the following schema “1234 1234 1234”  Inject port 2144 where applicable.

Line 281 for SUSE 11, or line 253 for SUSE 12

FW_SERVICES_EXT_TCP="2144"

Line 379 for SUSE 11, or line 351 for SUSE 12

FW_SERVICES_INT_TCP="2144"
  • Stop and start the firewall so configuration is loaded
/etc/SuSEfirewall2 stop

Pause 5 seconds

/etc/SuSEfirewall2 start

UPDATE JAVA CONFIGURATION FOR SUSE 12

  • Edit /etc/init.d/hyperic-hqee-agent .  Copy the following line (17) .  #export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/jre
    • For VMware appliances SUSE 12 this needs to be updated to: export JAVA_HOME=/usr/java/jre-vmware.
    • For VMware appliances SUSE 11 this needs to be updated to:  export HQ_JAVA_HOME=/usr/java/default
  • Add the new line, save and quit

hyperic4

CONFIGURE THE AGENT

  • Prior to starting the service, be sure to uncomment and modify the agent.setup values in the agent.properties file in /opt/hyperic/hyperic-hqee-agent/conf:
 # vi /opt/hyperic/hyperic-hqee-agent/conf/agent.properties

Uncomment and modify lines 71 through 80

agent.setup.camIP=<hyperic server IP or FQDN>
agent.setup.camPort=7080
agent.setup.camSSLPort=7443
agent.setup.camSecure=yes
agent.setup.camLogin=hqadmin
agent.setup.camPword= <hqadmin_password>
agent.setup.agentIP=*default*
agent.setup.agentPort=*default*
agent.setup.resetupTokens=no
agent.setup.acceptUnverifiedCertificate=yes

Uncomment line 86

agent.setup.unidirectional=no

Modify line 204.  set to =true

accept.unverified.certificates=true
  • ‘wq’ the file to save and exit

START THE AGENT

# sh /opt/hyperic/hyperic-hqee-agent/bin/hq-agent.sh start

-= OR =-

#  /etc/init.d/hyperic-hqee-agent start

 

  • Now you should be able to log into the hyperic UI and add the new server to inventory

Upgrade Existing vRealize Operations Manager Add-on/Solution Paks

The following was recorded using a vRealize Operations Manager (VROps) 6.6 cluster, however older versions of VROps can be upgraded the same way.

  • Log into the vROps environment, go to the Administration tab, and select solutions in the left column.
  • Here you can see all of the add-on/solutions paks that I have installed in this environment.  To upgrade an existing solution, simply click the green plus button.
Image.png
  • Browse for the new pak.  In this example I have selected “Reset Default Content” option.  As the statement suggests, this can override policies, customized alerts, symptoms etc. that may have been customized by your organization, forcing that work to be re-created.  However, I like using this option because I get those new changes, and can adjust my monitoring accordingly.  Use at your own discretion

Image.png

  • Click ‘upload’
Image.png
  • Click ‘Next’
  • Read and accept the EULA if you so desire
  • Click ‘Next’

Now the installation process will begin.  This shouldn’t take longer than 5 minutes.

vrops54

  • Click Finish

vrops55

Now the latest version of the Add-on/solutions pak is installed and ready for use.  In most cases it will just pick up the config from older versions.
Image.png

Collecting Java Heap dump from vCloud Director Cells

You just need to generate the java heap dump from one of the cells.  What you’ll need to succeed:

  • JCONSOLE
  • IP tables disabled on the cell you are connecting to.
  • Disk space available on the cell to accommodate the dump – I believe these can be between 8 and 10 GB in size
  • Unless an emergency, do this operation outside of normal business hours as it will be CPU intensive for up to 3 minutes, can impact API call performance, and can potentially cause the VCD cell inventory service to hang.

Step #1: Disable iptables on the cell

  • ssh to the desired cell and run the following command:

# service iptables stop

Step #2: Connect with jconsole (java console)

  • domain credentials should work here depending on your environment
  • connect to port: 8999
  • connect to desired cell

vcd9

  • If you get this message “Secure connection failed. Retry Insecurely?” just click the ‘insecure’ button to continue

 

vcd10

Step #3: Generate the heap dump

  1. On the MBeans tab, in the com.sun.management/HotSpotDiagnostics object, select the Operation section.
  2. In dumpHeap parameters, enter the following information:

    p0: [heap-output-path]

    p1: true – do a garbage collection before dump heap

    For example:

    p0: /opt/vmware/vcloud-director/vcd_cell_name_heap-dump-file.hprof

    p1: true

  3. Click the dumpHeap button.

vcd11

 

  • There will be no indication that the heapdump completes.  I just watch the size of the file until the growth stops on the cell.  This process typically takes less than two minutes.

Step #4: Cleanup and send-off

  • Locate the heap dump in /opt/vmware/vcloud-director/ and move off to a location where you can compress and upload to VMware FTP site as you would for logs.
  • Start the iptables on the cell: # service iptables start

Upgrading VMware vCloud Director to 8.20

This document was creating while upgrading an existing vCloud Director 8.10.1 environment with an Oracle database, and multiple cloud cells.

After downloading the latest version of vCloud Director 8.20 for service providers,  SCP the upgrade to all VCD cells.  You can review the release notes here.

What you’ll need to do before getting started:

  • SSH into each cell and ‘sudo su -‘ to root
  • move the bin to the root directory
  • chmod +x vmware-vcloud-director-distribution-8.20.0-5515092.bin
  • I strongly advise opening an support request with VMWare before proceeding with the upgrade.  You may not need it, but it comes in handy having one logged beforehand.

Maintenance – Shutdown the cells

1. Open an SSH session into each VCD cell

 

2. Sudo to root using the following command:

# sudo su -

3. Change to the vcloud-director/bin/  directory

# cd /opt/vmware/vcloud-director/bin/

4. Use the Cell Management Tool to quiesce the cell.  This will move active jobs over to another cell.

# ./cell-management-tool -u administrator cell --quiesce true

5. Get the status of any running jobs on each cell.   ** Verify Job count = 0   |  Is Active = false  | In Maintenance Mode  = false

# ./cell-management-tool -u administrator cell --status
Example Output:

vcd6

6. Shut the cell down to prevent any other jobs from becoming active on the cell.  This command will also allow active jobs to cleanly finish

# ./cell-management-tool -u administrator cell --shutdown

Example Output:

vcd7

7. Get a status on the cells to be sure everything is down

# service vmware-vcd status

8. Now complete steps 4 – 7 on the remaining cells to cleanly shutdown the vCD service on all cells.

9. Here is where I would shutdown the VCD cell virtual machines, and database to get a clean snapshot while the environment is powered off

10. Once the database virtual machine is fully up, power-on the VCD cell virtual machines.

11. Log back into the vCloud Director environment to verify functionality before the upgrade.

12. SSH to all VCD cell virtual machines and use the following command to stop the service again on each cell.  Here there is an assumption made that we are now well within a maintenance window.

# service vmware-vcd stop

Starting The vCloud Director Upgrade

1. Start with the first cell, and run the first half of the upgrade.  DO NOT upgrade the database yet.

# ./vmware-vcloud-director-distribution-8.20.0-5515092.bin

Example Output:

vCD1

2. Respond with: y

Example Output:

vcd2

3. Stop.  Now you need to run steps one and two on the rest of the vCloud Director Cells, and install the upgrade.  Do them one at a time.  DO NOT upgrade the database yet.

4. Now that all cells have been upgraded, go back to the first cell and run the database upgrade.

# ./opt/vmware/vcloud-director/bin/upgrade

Example vCD Database upgrade output:

vcd3

5. Respond with: y

vcd4

6. Start the the first cell by responding with ‘y’

vcd8

7. Manually start the VCD service on the remaining cells

# service vmware-vcd start

8. Get the VCD status of all cells by running the following command on each

# service vmware-vcd status

9. Log into the cell, and watch/wait for vCenter to sync with vCD under the Manage & Monitor section → vCenters.  This normally takes 30 minutes or so.  Once done the status will change from a spinning circle to a green check mark.

10. Run some environment validation tests to be sure everything is working and is proper, and then delete those snapshots taken earlier.

 

Upgrading NSX from 6.2.4 to 6.2.8 In a vCloud Director 8.10.1 Environment

We use NSX to serve up the edges in vCloud Director environment currently running on 8.10.1.  One of the important caveats to note here, that when you do upgrade an NSX 6.2.4 appliance in this configuration, you will no longer be able to redeploy the edges in vCD until you upgrade and redeploy the edge first in NSX.  Then and only then will the subsequent redeploys in vCD work.  The cool thing about that though, is VMware finally has a decent error message that displays in vCD if you do try to redeploy an edge before upgrading it in NSX, you’d see an error message similar to:

—————————————————————————————————————–

“[ 5109dc83-4e64-4c1b-940b-35888affeb23] Cannot redeploy edge gateway (urn:uuid:abd0ae80) com.vmware.vcloud.fabric.nsm.error.VsmException: VSM response error (10220): Appliance has to be upgraded before performing any configuration change.”

—————————————————————————————————————–

Now we get to the fun part – The Upgrade…

A little prep work goes a long way:

  • If you have a support contract with VMware, I HIGHLY RECOMMEND opening a support request with VMware, and detail with GSS your upgrade plans, along with the date of the upgrade.  This allows VMware to have a resource available in case the upgrade goes sideways.
  • Make a clone of the appliance in case you need to revert (keep powered off)
  • Set host clusters DRS where vCloud Director environment/cloud VMs are to manual (keeps VMs/edges stationed in place during upgrade)
  • Disable HA
  • Do a manual backup of NSX manager in the appliance UI

Shutdown the vCloud Director Cell service

  • It is highly advisable to stop the vcd service on each of the cells in order to prevent clients in vCloud Director from making changes during the scheduled outage/maintenance.  SSH to each vcd cell and run the following in each console session:
# service vmware-vcd stop
  • A good rule of thumb is to now check the status of each cell to make sure the service has been disabled.  Run this command in each cell console session:
# service vmware-vcd status
  • For more information on these commands, please visit the following VMware KB article: KB1026310

Upgrading the NSX appliance to 6.2.8

  1. Log into NSX manager and the vCenter client
  2. Navigate to Manage→ Upgrade

nsx4

  1. Click ‘upgrade’ button
  2. Click the ‘Choose File’ button
  3. Browse to upgrade bundle and click open
  4. Click the ‘continue button’, the install bundle will be uploaded and installed.

 

nsx1

nsx2

  1. You will be prompted if you would like to enable SSH and join the customer improvement program
  2. Verify the upgrade version, and click the upgrade button.

nsx5

  1. The upgrade process will automatically reboot the NSX manager vm in the background. Having the console up will show this.  Don’t trust the ‘uptime’ displayed in the vCenter for the VM.
  2. Once the reboot has completed the GUI will come up quick but it will take a while for the NSX management services to change to the running state. Give the appliance 10 minutes or so to come back up, and take the time now to verify the NSX version. If using guest introspection, you should wait until the red flags/alerts clear on the hosts before proceeding.
  3. In the vSphere web client, make sure you see ‘Networking & Security’ on the left side.  If it does not show up, you may need to ssh into the vCenter appliance and restart the web service.  Otherwise continue to step 12.
# service vsphere-client restart

12. In the vsphere web client, go to Networking and Security -> Installation and select the Management Tab.  You have the option to select your controllers and download a controller snapshot.  Otherwise click the “Upgrade Available” link.

nsx8

13. Click ‘Yes’ to upgrade the controllers.  Sit back and relax.  This part can take up to 30 minutes.  You can click the page refresh in order to monitor progress of the upgrades on each controller.

nsx9

14.  Once the upgrade of the controllers has completed, ssh into each controller and run the following in the console to verify it indeed has connection back to the appliance

# show control-cluster status

15. On the ESXi hosts/blades in each chassis, I would run this command just as a sanity check to spot any NSX controller connection issues.

 esxcli network ip connection list | grep 1234
  • If all controllers are connected you should see something similar in your output

nsx10

  • If controllers are not in a healthy state, you may get something similar to this next image in your output.  If this is the case, you can first try to reboot the controller.  If that doesn’t work try a reboot.  If that doesn’t work…..weep in silence. Then call VMware using the SR I strongly suggested creating before the upgrade, and GSS or your TAM can get you squared away.

nsx11

16.  Now in the vSphere web client, if you go back to Network & Security -> Installation -> Host Preparation,  you will see that there in an upgrade available for the clusters.  Depending on the size of your environment, you may choose to do the upgrade now or at a later time outside of the planned outage.  Either way you would click on the target cluster ‘Upgrade Available’ link and select yes.  Reboot one host at a time that way the vibs are installed in a controlled fashion. If you simply click resolve, the host will attempt to go into maintenance mode and reboot.

17. After the new vibs have been installed on each host, run the following command to be sure they have the new vib version:

# esxcli software vib list | grep -E 'esx-dvfiler|vsip|vxlan'

Start the vCloud Director Cell service

  • On each cell run the following commands

To start:

# service vmware-vcd start

Check the status after :

# service vmware-vcd status
  • Log into VCD and by now the inventory service should be syncing with the underlining vCenter.  I would advise waiting for it to complete, then run some sanity checks (provision orgs, edges, upgrade edges, etc)

 

 

 

 

Performing A Database Health Check On vRealize Operation Manager (vROPS) 6.5

In a previous post I showed how you could perform a healthcheck, and possibly resolve database load issues in vROPs versions from 6.3 and older. When VMware released the vROPS 6.5, they changed the way you would access the nodetool utility that is available with cassandra database.

 $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool --port 9008 status

For the 6.5 release and newer, they added the requirement of using a ‘maintenanceAdmin’ user along with a password file.  The new command to check the load status of the activity tables in a vROPS 6.5+ is as follows:

  $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/nodetool -p 9008 --ssl -u maintenanceAdmin --password-file /usr/lib/vmware-vcops/user/conf/jmxremote.password status

Example output would be something similar to this if your cluster is in a healthy state:

vrops51

If any of the nodes have over 600 MB of load, you should consult with VMware GSS or a TAM on the next steps to take, and how to elevate the load issues.

Next we can check the syncing status of the cluster to determine overall health.  The command is as follows:

$VMWARE_PYTHON_BIN /usr/lib/vmware-vcops/tools/vrops-platform-cli/vrops-platform-cli.py getShardStateMappingInfo

Example output:

vrops52

The “vRealize Ops Shard” refers to the data nodes, and the Master and Master Replica nodes in the main cluster. The available status’ are RUNNING, SYNCING, BALANCING, OUT_OF_BALANCE, and OUT_OF_SYNC.

  • Out of Balance and Out of Sync should be enough to open an SR and have VMware take a look.

Lastly, we can take a look at the size of the activity table.  You can do this by running the following command:

 du -sh /storage/db/vcops/cassandra/data/globalpersistence/activity_tbl-*

Example Output:

vrops53

If there are two listed here, you should consult with VMware GSS as to which one can safely be removed, as one would be left over from a previous upgrade.

 

Removing Old vROps Adapter Certificates

I’ve come across this issue in previous versions of vRealize Operations Manager prior to the 6.5 release, where you delete an adapter for data collection like vSphere, NSX or VCD, and immediately try to re-create it.  Whether it was a timing issue, or vROps just didn’t successfully complete the deletion process, I’d typically get an error that the new adapter instance could not be created because a previous one exists with the same name.  Now there are two ways around this.  You can connect the adapter to whatever instance (VCD, NSX, vSphere) you are trying to collect data from using the IP address, instead of the FQDN (or vice-versa), or you can cleanup the certificate that was left behind manually as I will outline the steps below.

To resolve the issue, delete the existing certificate from Cassandra DB and accept the new certificate re-creating adapter instance.

1. Take snapshots of the cluster

2.  SSH to the master node.  Access the Cassandra DB by running the following command:

 $VMWARE_PYTHON_BIN $VCOPS_BASE/cassandra/apache-cassandra-2.1.8/bin/cqlsh --ssl --cqlshrc $VCOPS_BASE/user/conf/cassandra/cqlshrc

3. Access the database by running the following command:

use globalpersistence;

4.  We will need to look at the entries in the global persistence certificate store.  To do this, first list all the entries in globalpersistence.certificate store by running the following command:

SELECT * from globalpersistence.certificate;

5. From the list, find the desired certificate.  Now select that specific certificate with the following command:

 SELECT * from globalpersistence.certificate where key = 'Certificate.<ThumbprintOfVCCert>' and classtype = 'certificate' ALLOW FILTERING;

For example:

 SELECT * from globalpersistence.certificate where key = 'Certificate.e88b13c9e346633f94e46d2a74182d219a3c98b2' and classtype = 'certificate' ALLOW FILTERING;

6.   The tables which contains the information:

namespace | classtype | key | blobvalue | strvalue | valuetype | version
——————-+—————+——-+—————–+————–+—————–+———

7.  Select the Key which matches the Thumbprint of the Certificate you wish to remove and run the following command:

 DELETE FROM globalpersistence.certificate where key = 'Certificate.<ThumbprintOfVCCert>' and classtype = 'certificate' and namespace = 'certificate';

For example:

 DELETE FROM globalpersistence.certificate where key = 'Certificate.e88b13c9e346633f94e46d2a74182d219a3c98b2' and classtype = 'certificate' and namespace = 'certificate';

8.  Verify that the Certificate has been removed from the VMware vRealize Operations Manager UI by navigating to:

Administration > Certificates

9.  Click the Gear icon on the vSphere Solution to Configure.

10.  Click the icon to create an new instance. Do not remove the existing instance unless the data can be lost.  If the old instance has already been deleted prior to this operation, then this warning can be ignored.

11.  Click Test Connection and the new certificate will be imported.

12.   Upon clicking Save there will be an error stating the Resource Key already exists. Ignore this and click Close and the UI will show Discard Changes?. Click Yes.

13.   Upon clicking Certificates Tab the Certificate is shown for an existing VC Instance.  Now you should have a new adapter configured and collecting.  If you kept the old adapter for the data, it can safely be removed after the data retention period has expired.