NSX SSL Certificate Failure on ESXi: SSL handshake failed

Some time ago I was having an issue putting a host back into service in an NSX environment.  In Log Insight, and in the /var/log/netcpa.log I was seeing errors similar to the following:

2018-05-26T11:07:50.486Z [FFD53B70 error 'Default'] SSL handshake failed on : error = SSL Exception: error:140000DB:SSL routines:SSL routines:short read
2018-05-26T11:07:55.545Z [FFD12B70 error 'Default'] SSL handshake failed on : error = SSL Exception: error:140000DB:SSL routines:SSL routines:short read
2018-05-26T11:08:00.600Z [FFD12B70 error 'Default'] SSL handshake failed on : error = SSL Exception: error:140000DB:SSL routines:SSL routines:short read

Browsing through VMware’s archive, I came across KB2151089, very similar to the issue I was having, however upgrading to NSX 6.3.5 was not an option at the time.  I remembered a similar issue at my previous workplace, and dug through my evernote archive to find my notes.

Before we continue, this should go without saying, but your milage may very, and I’d recommend opening a ticket with VMware’s GSS.  At the very least you should test this process out in a lab.

These steps outlined here will resolve the issue.  Keep in mind at this point, the host is not in production, and currently is in maintenance mode:

  • Determine if the NSX controllers are connected by logging into the ESXi host, and running the following commands:
# esxcli network ip connection list |grep 1234

— and —

# esxcli network ip connection list |grep 5671


  • Next, log into the NSX appliance and backup the config.  While the config backup is taking place, get the ESXi host mob id from the vCenter mob page https://<vcsa-fqdn>/mob
  1. select the link for the ‘root folder‘, eg. group-d1
  2. select the link for the ‘child entity‘ eg. datacenter-2
  3. select the link for the ‘host folder‘ eg. group-h4
  4. select the link for the ‘child entity‘ eg. domain-c7
  5. Now locate the ‘host‘ and find the host-xxxx value. eg: host-1234 
  • After the NSX backup is complete, ssh into the NSX manager.  Root access to the appliance will be needed, so at the command prompt:
  1. Enter ‘en‘ and the enter the ‘admin’ password
  2. Enter ‘st en‘ and enter the following password: IAmOnThePhoneWithTechSupport 
  • Log into the sql prompt
# psql -U secureall
  • Issue the following command to verify that there is a record associated with the host mob ID.  Below is an example using host-1234
# select host_uuid,node_uuid,thumbprint from vnvp_hot_key where host_uuid='host-1234';

Example output:

host_uuid  |              node_uuid               | thumbprint                      
host-1234  | a2a68660-515e-4f87-811d-306c54b0b2e8 |AD:58:C0:84:FF:DF: 5E:95:50:B7:63:2E:3F:B2:67:22:56:F7:DC:9B

(1 row)

  • Next, in vCenter move the host to an isolation cluster.  We will need to validate the NSX vibs installed by running the following command on the host:
# esxcli software vib list |grep -E 'esx-dvfilter-switch-security|vsip|vxlan'


Example output:

esx-dvfilter-switch-security   6.3.1-0.0.5124716  VMware  VMwareCertified 2017-02-28
esx-vsip                       6.3.1-0.0.5124716  VMware                VMwareCertified 2017-02-28

esx-vxlan                      6.3.1-0.0.5124716  VMware VMwareCertified 2017-02-28


  • Remove the NSX vibs with the following commands:
# esxcli software vib remove -n esx-vxlan
# esxcli software vib remove -n esx-vsip
# esxcli software vib remove -n esx-dvfilter-switch-security


  • Returning to the NSX terminal window, now delete the record using the secureall=# prompt. Using ‘host-1234’ as an example.
# delete from vnvp_host_key where host_uuid='host-1234';


  • Reboot the ESXi host.  Once the host has rebooted, put the host back into the proper cluster.  To be safe, I would temporarily turn down DRS (move slider left), and exit maintenance mode.
  • We can validate that the host looks proper in vSphere web UI: ‘Network & Security -> Installation -> Host Preparation Tab‘ .
  • Click the ‘Resolve‘ link next to the cluster name


  • Once the tasks are all completed you can run the ‘esxcli software vib list….‘ command again to see that the three vibs have been installed.
  • Test that the vxlan network is functioning on the host.
  • Verify that the SSL Exception is no longer showing in the /var/log/netcpa.log.
  • If there are no errors, then the host is all set to be put back into service.




VMworld 2018 is right around the corner! Where will you be?

It’s almost that time a year again….some might even call it that special time of year where VMware geeks from across the globe converge on VMworld.  One might even consider this summer camp, and like any who have experienced this before, you meet new people in the vCommunity, make friends, and part ways after the week of technical sessions, social gatherings, and just the straight up shop talking, war story sharing, and the sharing of ideas.  Personally, this will be my third year attending, and I am super excited to be going.  This conference means enough to me that, due to other circumstances that happened early this year, I purchased my own pass so to ensure that I wouldn’t miss out.

Now is the perfect time to cash in on those early bird discounts on conference passes, good until June 15.  Why wouldn’t you want to save a couple hundred dollars on one of the best IT conferences of the year?  For an individual, it’s $1,795 vs $2,095.  That’s before other discounts that may be applied like vmug memberships, or the discount for VMware Certified Professionals who hold an active VCP.

So, why go to VMworld?

I think for many first timers, there’s a certain electricity, and excitement about going.  Let me be the first to tell you, that feeling…. never really goes away.  Like the past couple of years, VMworld in the US will be held once again in Las Vegas.

Image result for VMworld 2018

I personally love coming to VMworld and have looked forward to it every year.  There’s always good energy here; the minute you get off the plane, it is happy.  Every experience I’ve had here is fun, and people genuinely are in a good mood.  This conference gives attendees the chance to attend VMware lead, and partner lead sessions on platforms you may have thought about using or are currently using.  These sessions are meant to share best practices with the community, transfer knowledge in ways to use VMware platforms, and also give you a chance to ask the experts, many of whom work for VMware, and in some cases, are very involved with the development of the platforms you use.

VMworld is not just about attending sessions however.  This conference gives you the unique opportunity to network with other IT professionals from across the globe and establish relationships that you would otherwise never be able to do.  Like it did for me, this conference may also inspire you to join the vCommunity, a thriving community of professionals who not only share their knowledge with others, but who also need help themselves.  I think we can all agree that no two environments/businesses are alike, and we have all used VMware’s platforms in ways that were intended, and in ways that even VMware might not have ever considered.  Members of the vCommunity take it upon themselves to share their experiences with others, through blogs, social media, and support forums to help others.  This conference gives us a chance to get together, share war stories from our time in the trenches, and many times, you will find attendees getting together to engineer and develop something cool.

VMware {code} group has even put together a hackathon, where members from the vCommunity can get together while at VMworld, to develop some amazing things, and sometimes there are even prizes to be had for the coolest of the cool ideas.  But don’t let those words “code” or “hackathon” scare you.  These sessions are not just for developers!  Sure it will certainly help, but the power of the community, enables you to participate in these teams anyways.  You may not be able to contribute code, but you can still contribute ideas to the team, and you might even pick up a few coding skills in the fun.  Let’s face it; some pretty cool ideas are cooked up during hackathons.  VMware’s internal hackathon cooked up the idea to bring VR into the datacenter, and allow you to virtually move your workloads from On-Premises Data Centers, into the cloud.  It’s freakin VR man!  How cool is that?


The VMworld conference also affords you the opportunity to attend instructor lead labs, along with VMware’s hands on labs that you can also experience from home.  While at the conference, there will be many vendors out on the floor where you can experience new products, ask questions about products that you already use, and lets not forget the vendor haul crawl where there will be free adult beverages, snacks, and cool swag vendors are giving out.  All can be found in the solutions exchange area.

Image result for VMworld 2017

I’m not going to lie, the parties at VMworld are pretty wild too.  Not saying that should be the only reason you go, but it is a good way to mingle with other conference attendees, jam out to some good music, and of course escape the Las Vegas heat.  VMworld of course wraps up with it’s own party, before the last day of the conference.

Screen Shot 2018-06-02 at 12.16.46 PM

So what are you waiting for?  I can’t think of any reason not to attend the US 2018 VMworld in Las Vegas, August 26th – 30th, or the UK 2018 VMworld in Barcelona, November 5th – 8th.  Follow this link here, and I will see you at the conference in Las Vegas!  Remember to take advantage of those early bird rates, good until June 15th!  REGISTER HERE FOR VMWORLD 2018

Screen Shot 2018-06-03 at 9.29.50 AM


Add Custom Recommendation to vROps alert definition for versions prior to 6.6

  • This is useful when a new SOP document is created, we will be able to link to it directly on the alert email that is sent.


  1. Log into the main vRealize Operations Manager page.
  2. Click Content and then Recommendations

  3. On this page you can create, edit and delete custom recommendations.  Click the green plus sign to create a new custom recommendation.
  4. Here you can enter the test for the custom recommendation.  Paste the link to the SOP, highlight it, and then click the hyperlink icon.  Now paste the link again and click OK.  The “actions” section will allow the use of automated functions if you were looking at the triggered alert in vROps.  For now, just click save.
  5. Now you can add the custom recommendation to an alert definition.  Click Content and then Alert Definitions.
  6. Search for the alert that you would like to add the SOP to, select it and click the edit button.
  7. Click on section 5: Add Recommendations and then click on the plus sign
  8. Now you will need to search for the new SOP recommendation you just created, so search for SOP, find it in the list on the left, click and drag to position under the Recommendations section.
  9. Finally click save.  Now when this alert is triggered, and an email is sent, there will be a clickable link in the email to the SOP document.

The Home Lab Hardware



I decided to go with a Supermicro build as I wanted something power efficient, yet expandable, and this motherboard supports up to 128GB of ECC RDIMM DDR4 2133MHz server grade memory.  Now with this setup, when I feel the need to expand out my lab, I can build two more nodes, and I’ll have a rather nice VSAN cluster.  However I’m hoping the cost of DDR4 memory will have come down by then…

I did look at the Supermicro SYS-E300-8D and SYS-E200-8D style micro servers, but like most, I was concerned about the fan noise, and thus decided to go with a slightly larger chassis to get the larger fan.  Honestly the fan in the unit I bought makes no more noise then a regular desktop computer.

Here’s my hardware:



motherboardSUPERMICRO MBD-X10SDV-TLN4F-O Mini ITX Server Motherboard Xeon processor D-1541 FCBGA 1667 





Black Diamond Memory 64GB (2 x 32GB) 288-Pin DDR4 SDRAM ECC Registered DDR4 2133 (PC4 17000) Server Memory Model BD32GX22133MQR26




WD Blue M.2 250GB Internal SSD Solid State Drive – SATA 6Gb/s – WDS250G1B0B




(x 2) SAMSUNG 850 PRO 2.5″ 512GB SATA III 3D NAND Internal Solid State Drive (SSD) MZ-7KE512BW





SUPERMICRO CSE-721TQ-250B Black Mini-Tower Server Case 250W Flex ATX Multi-output Bronze Power Supply




Who doesn’t love some internal shots after the lab-box has been put together?  🙂



In the coming blog posts, I’ll be building out my lab.  Stay tuned….

Manually starting vRealize Hyperic 5.8.X Appliance

I’ve had this happen to me on the 5.8.4 appliance and thought I would share.  Normally The Hyperic appliance is deployed as a vApp consisting of two VMs, and when the vApp is started/restarted, they each start in the proper order.  This process might be needed if the database doesn’t exit/shutdown normally and thus doesn’t start up right the next time.  And if the database isn’t running, the Hyperic UI server won’t start.

Login to the server with ssh, use the hqamdin password with the root username that you specified during the vRealize Hyperic Appliance deployment, unless you have changed them of course…

First start the Postgresql database: hypericdb.  These services have to be started under the hqadmin account.  

  • To check the status of the service run the following command:
# su -c '/opt/vmware/vpostgres/9.1/bin/pg_ctl status -D /opt/vmware/vpostgres/9.1/data/' - hqadmin
  • To start the service run the following command:
# su -c '/opt/vmware/vpostgres/9.1/bin/pg_ctl start -D /opt/vmware/vpostgres/9.1/data/' - hqadmin

Once the database is running, start the hyperic server: hyperic.  This service has to be started under the hyperic account.

  • You can check the status of the hyperic server service by running the following command:
# su -c '/opt/hyperic/server-5.8.4-EE/bin/./hq-server.sh status' - hyperic
  • You can start the service by running the following command:
# su -c '/opt/hyperic/server-5.8.4-EE/bin/./hq-server.sh start' - hyperic


You can follow if the Hyperic server starts properly from the bootstrap log on the xx01-m-hyperic server.

# tail -f /opt/hyperic/server-5.8.4-EE/bin/logs/bootstrap.log


Hope this helps anyone out there who still uses vRealize Hyperic







Enable TLS v1 In vCloud Director 8.20 and vCloud Availability 1.0

VMware’s vCloud Director (vCD) and vCloud Availability (vCAV) only come with TLS v1.1 and 1.2 enabled out of the box.  This process will show you how to enable TLS v1.  If more information is needed, please visit VMware’s Documentation on vCloud Director 8.20, or the following KB2145796.  This work should be completed after hours as you would inevitably be moving VCD proxy service from one cell to another, and this could cause a brief outage for customers.  This process will require taking the cell offline, so do each cell one at a time starting with a cell not running the inventory service

  • Open an SSH session to a VCD cell, or vCAv cloud proxy cell, and su to root
  • Change to the ‘ /opt/vmware/vcloud-director/bin/ ‘ directory
  • Use the Cell Management Tool to quiesce the cell.  This will move active jobs over to another cell, and cleanly shutdown the cell.  You should make note which VCD cell has the proxy service enabled, and avoid that cell until last.
# ./cell-management-tool -u administrator cell --quiesce true
  • Get the status of any running jobs on each cell.   ** Verify Job count = 0   |  Is Active = false  | In Maintenance Mode  = false
# ./cell-management-tool -u administrator cell --status

Example Output:

Job count = 0
Is Active = false In Maintenance Mode = false
  • Shut the cell down to prevent any other jobs from becoming active on the cell.
# ./cell-management-tool -u administrator cell --shutdown

Example Output:

Cell successfully deactivated and all tasks cleared in preparation for shutdown Stopping vmware-vcd-watchdog:                              [  OK  ] Stopping vmware-vcd-cell:                                  [  OK  ]
  • Run the following command on the vCD cell in /opt/vmware/vcloud/bin/ to enable TLS1
# ./cell-management-tool ssl-protocols -d SSLv3,SSLv2Hello
  • Start the cell service, and validate that a vCD cell has the listener service running from the UI, and that vCenter is connected to one of the cells.
# service vmware-vcd start
  • To validate that TLS v1 has been enabled on the vCD cell, or vCAV cloud proxy cell, run the following command
# ./cell-management-tool ssl-protocols -l

Example output

Allowed SSL protocols:
* TLSv1.2
* TLSv1.1
* TLSv1
  • If you have additional VCD cells, or vCAV cloud proxy cells, repeat this process one at a time.









Network Scanners Can Crash vRealize Operations Manager Tomcat Service On Large Clusters

If network scanners are deployed in your production environments, it may be necessary to white-list the vROps nodes, as the network scanners can bring the tomcat service to its’ knees, especially on active vROps clusters.  In my case the network scanner was causing tomcat to crash, so when users would attempt to access the main vROps , they’d get the following error:

Unable to connect to platform services

While troubleshooting this issue, I went through the sizing of the cluster, performance, verifying there’s nothing backing up the vROps VMs, even made sure the datastores and specific hosts were health.  Even tried replacing the “/usr/lib/vmware-vcops/user/plugins/inbound” directory and files on all nodes from the master copy in hopes that it would make the cluster healthy again and stop tomcat from panicking.

The following was discovered after reviewing the /var/log/apache2/access_log on the master: - - [10/Oct/2017:04:56:23 +0000] "GET /recipe/login.php?Password=%22'%3e%3cqqs%20%60%3b!--%3d%26%7b()%7d%3e&Username=&submit=Login HTTP/1.0" 301 362 "-" "-" - - [10/Oct/2017:04:56:23 +0000] "GET /recipe/recipe/login.php?Password=%22'%3e%3cqqs%20%60%3b!--%3d%26%7b()%7d%3e&Username=&submit=Login HTTP/1.0" 301 369 "-" "-" - - [10/Oct/2017:04:56:23 +0000] "GET /recipe/recipe_search.php?searchstring=alert(document.domain) HTTP/1.0" 301 326 "-" "-" - - [10/Oct/2017:04:56:23 +0000] "GET /recipe/recipe/recipe_search.php?searchstring=alert(document.domain) HTTP/1.0" 301 333 "-" "-" - - [12/Oct/2017:08:30:43 +0000] "GET /recipe_view.php?intId=char%2839%29%2b%28SELECT HTTP/1.1" 301 282 "-" "-" - - [12/Oct/2017:08:31:06 +0000] "GET /modules.php?name=Search&type=stories&query=qualys&catebgory=-1%20&categ=%20and%201=2%20UNION%20SELECT%200,0,aid,pwd,0,0,0,0,0,0%20from%20nuke_authors/* HTTP/1.1" 301 410 "-" "-" - - [12/Oct/2017:08:31:06 +0000] "GET /modules.php?name=Top&querylang=%20WHERE%201=2%20ALL%20SELECT%201,pwd,1,1%20FROM%20nuke_authors/* HTTP/1.1" 301 342 "-" "-" - - [12/Oct/2017:08:31:10 +0000] "GET /index.php?option=com_jumi&fileid=-530%27%20UNION%20SELECT%202,concat%280x6a,0x75,0x6d,0x69,0x5f,0x73,0x71,0x6c,0x5f,0x69,0x6e,0x6a,0x65,0x63,0x74,0x69,0x6f,0x6e%29,null,null,null,0,0,1%20--%20%27 HTTP/1.1" 301 445 "-" "-" - - [10/Oct/2017:04:20:19 +0000] "GET /recipe_view.php?intId=char%2839%29%2b%28SELECT HTTP/1.1" 301 282 "-" "-" - - [10/Oct/2017:04:20:42 +0000] "GET /modules.php?name=Search&type=stories&query=qualys&category=-1%20&categ=%20and%201=2%20UNION%20SELECT%200,0,aid,pwd,0,0,0,0,0,0%20from%20nuke_authors/* HTTP/1.1" 301 410 "-" "-" - - [10/Oct/2017:04:22:32 +0000] "GET /third_party/fckeditor/editor/_source/classes/fckstyle.js HTTP/1.1" 301 284 "-" "-" - - [10/Oct/2017:04:22:32 +0000] "GET /third_party/tinymce/jscripts/tiny_mce/plugins/advlink/readme.txt HTTP/1.1" 301 292 "-" "-" - - [10/Oct/2017:04:22:32 +0000] "GET /rsc/smilies/graysmile.gif HTTP/1.1" 301 253 "-" "-" - - [10/Oct/2017:04:22:32 +0000] "GET /media/users/admin/faceyourmanga_admin_girl.png HTTP/1.1" 301 274 "-" "-"


Tomcat service is being pushed to the limits and using many more resources than planned. There is upwards of 10,000 requests in bursts from a single IP address.  From the logs it certainly looks like an attack, but that’s coming from an internal IP address.

My advice – get your security team to white-list your vROps appliances.

To restart the web service on all vROps nodes either by issuing this command to each node: ‘service vmware-vcops-web restart’ , or log into the admin page, take the cluster offline and then back online.

Install Hyperic Agent 5.8.x On SUSE 11 and SUSE 12 Based VMware Appliances

Let me start out by saying that if you’d like to install the Hyperic agent, a VMware platform (vRealize Hyperic) that is nearing the end of its’ life (late 2018), you should first **make sure having the agent installed on VMwares’ SUSE based appliance is supported.**

vRealize Hyperic is a terrific platform, that unfortunately has reached the end of its product development life cycle, and will ultimately reach the end of support late 2018.

With that said…

In this particular case I wanted to monitor the SUSE appliance virtual machines of VMware’s vCloud Availability, and since I already am using Hyperic to monitor our production environment management virtual machines…

  • To start the installation run:
# zypper install vcenter-hyperic-agent-5.8.4.EE-1.noarch.rpm

example output:


  • Respond with:     a

example output:


  • Respond with:      y

example output:



  • Edit /etc/sysconfig/SuSEfirewall2 and update lines 281 and 379 with the addition of port 2144 for SUSE 11, or lines 253 and 351 with the addition of port 2144 for SUSE 12
  • Note: For listing multiple ports SuSEfirewall 2 uses the following schema “1234 1234 1234”  Inject port 2144 where applicable.

Line 281 for SUSE 11, or line 253 for SUSE 12


Line 379 for SUSE 11, or line 351 for SUSE 12

  • Stop and start the firewall so configuration is loaded
/etc/SuSEfirewall2 stop

Pause 5 seconds

/etc/SuSEfirewall2 start


  • Edit /etc/init.d/hyperic-hqee-agent .  Copy the following line (17) .  #export JAVA_HOME=/usr/lib/jvm/java-6-openjdk/jre
    • For VMware appliances SUSE 12 this needs to be updated to: export JAVA_HOME=/usr/java/jre-vmware.
    • For VMware appliances SUSE 11 this needs to be updated to:  export HQ_JAVA_HOME=/usr/java/default
  • Add the new line, save and quit



  • Prior to starting the service, be sure to uncomment and modify the agent.setup values in the agent.properties file in /opt/hyperic/hyperic-hqee-agent/conf:
 # vi /opt/hyperic/hyperic-hqee-agent/conf/agent.properties

Uncomment and modify lines 71 through 80

agent.setup.camIP=<hyperic server IP or FQDN>
agent.setup.camPword= <hqadmin_password>

Uncomment line 86


Modify line 204.  set to =true

  • ‘wq’ the file to save and exit


# sh /opt/hyperic/hyperic-hqee-agent/bin/hq-agent.sh start

-= OR =-

#  /etc/init.d/hyperic-hqee-agent start


  • Now you should be able to log into the hyperic UI and add the new server to inventory

Upgrade Existing vRealize Operations Manager Add-on/Solution Paks

The following was recorded using a vRealize Operations Manager (VROps) 6.6 cluster, however older versions of VROps can be upgraded the same way.

  • Log into the vROps environment, go to the Administration tab, and select solutions in the left column.
  • Here you can see all of the add-on/solutions paks that I have installed in this environment.  To upgrade an existing solution, simply click the green plus button.
  • Browse for the new pak.  In this example I have selected “Reset Default Content” option.  As the statement suggests, this can override policies, customized alerts, symptoms etc. that may have been customized by your organization, forcing that work to be re-created.  However, I like using this option because I get those new changes, and can adjust my monitoring accordingly.  Use at your own discretion


  • Click ‘upload’
  • Click ‘Next’
  • Read and accept the EULA if you so desire
  • Click ‘Next’

Now the installation process will begin.  This shouldn’t take longer than 5 minutes.


  • Click Finish


Now the latest version of the Add-on/solutions pak is installed and ready for use.  In most cases it will just pick up the config from older versions.

Collecting Java Heap dump from vCloud Director Cells

You just need to generate the java heap dump from one of the cells.  What you’ll need to succeed:

  • IP tables disabled on the cell you are connecting to.
  • Disk space available on the cell to accommodate the dump – I believe these can be between 8 and 10 GB in size
  • Unless an emergency, do this operation outside of normal business hours as it will be CPU intensive for up to 3 minutes, can impact API call performance, and can potentially cause the VCD cell inventory service to hang.

Step #1: Disable iptables on the cell

  • ssh to the desired cell and run the following command:

# service iptables stop

Step #2: Connect with jconsole (java console)

  • domain credentials should work here depending on your environment
  • connect to port: 8999
  • connect to desired cell


  • If you get this message “Secure connection failed. Retry Insecurely?” just click the ‘insecure’ button to continue



Step #3: Generate the heap dump

  1. On the MBeans tab, in the com.sun.management/HotSpotDiagnostics object, select the Operation section.
  2. In dumpHeap parameters, enter the following information:

    p0: [heap-output-path]

    p1: true – do a garbage collection before dump heap

    For example:

    p0: /opt/vmware/vcloud-director/vcd_cell_name_heap-dump-file.hprof

    p1: true

  3. Click the dumpHeap button.



  • There will be no indication that the heapdump completes.  I just watch the size of the file until the growth stops on the cell.  This process typically takes less than two minutes.

Step #4: Cleanup and send-off

  • Locate the heap dump in /opt/vmware/vcloud-director/ and move off to a location where you can compress and upload to VMware FTP site as you would for logs.
  • Start the iptables on the cell: # service iptables start