vSphere with Tanzu: Project Contour TLS Delegation Invalid – Secret Not Found

Blog Date: June 25, 2021
Tested on vSphere 7.0.1 Build 17327586
vSphere with Tanzu Standard

On a recent customer engagement, we ran into an issue where after we deployed project Contour, and created a TLS delegation “contour-tls”, but we ran into an issue where Contour did not like the public wildcard certificate we provided. We were getting an error message “TLS Secret “projectcontour/contour-tls” is invalid: Secret not found.”

After an intensive investigation to make sure everything in the environment was sound, we came to the conclusion that the “is invalid” part of the error message suggested that there was something wrong with the certificate. After working with the customer we discovered that they included the Root, the certificate, and the intermediate authorities in the PEM file. The root doesn’t need to be in the pem.  Just the certificate, and the intermediate authorities in descending order. Apparently that root being in the pem file made Contour barf. Who knew?

You could possibly see that the certificate is the issue by checking the pem data for both the <PrivateKeyName>.key and the <CertificateName>.crt by running the following commands, and comparing the pem output. IF it doesn’t match this could be your issue as well. The “<>” should be updated with your values, and don’t include these “<” “>”.

openssl pkey -in <PrivateKeyName>.key -pubout -outform pem | sha256sum
openssl x509 -in <CertificateName>.crt -pubkey -noout -outform pem | sha256sum

Below are the troubleshooting steps we took, and what we did to resolve the issue. We were using Linux, and had been logged into vSphere with Tanzu already. Did I mention that I hate certificates? But I digress….

The Issue:

You had just deployed a TKG cluster, and then deployed Project Contour as the ingress controller that uses a load balancer to be the single point of entry for all external users. This connection terminates SSL connections, and you have applied a public wildcard certificate to it. You created the TLS secret, and have created the TLS delegation, so that new apps deployed to this TKG cluster can delegate TLS connection terminations to contour. However, after you deployed your test app to verify the TLS delegation is working, you see a status of “Invalid. At least one error present, see errors for details.”, after running the following command:


kubectl get httpproxies

Troubleshooting:

  1. You run the following command to gather more information, and see in the error message: “Secret not found” Reason: “SecretNotValid
kubectl describe httpproxies.projectcontour.io

2. You check to make sure the TLS Secret was created in the right namespace with the following command, and you see that it is apart of the desired namespace. In this example, our namespace was called projectcontour, and the TLS secret was called contour-tls.


kubectl get secrets -A

3. You check the TLS delegation to make sure it was created with the following command. In this example ours was called contour-delegation, and our namespace is projectcontour.

kubectl get tlscertificatedelegations -A

4. You look at the contents of the tlscertificatedelegations with the following command, and nothing looks out of the ordinary.

kubectl describe tlscertificatedelegations -A

5. You check to see the secrets of the namespace with the following command. In this example our namespace is called projectcontour and we can see our TLS delegation contour-tls.


kubectl get secrets --namespace projectcontour

6. You validate contour-tls has data in it with the following command. In this example, our namespace is projectcontour and our TLS is contour-tls.

kubectl get secrets --namespace projectcontour contour-tls -o yaml

In the yaml output, up at the top you should see tls.crt: with data after

Down towards the bottom of the yaml output, you should see tls.key with data after

Conclusion: Everything looks proper on the Tanzu side. Based on the error message we saw “TLS Secret “projectcontour/contour-tls” is invalid: Secret not found.” The “is invalid” part could suggest that there is something wrong with the contents of the TLS secret.  At this point, the only other possibility would be that the public certificate has something wrong and needs to be re-generated.   The root doesn’t need to be in the pem.  Just the certificate for the site, and intermediate authorities in descending order.

The Resolution:

  1. Create and upload the new public certificate for contour to vSphere with Tanzu.
  2. We’ll need to delete the secret and re-create it.  Our secret was called “contour-tls”, and the namespace was called “projectcontour”.
kubectl delete secrets contour-tls -n projectcontour

3. We needed to clean our room, and delete the httpproxies we created in our test called test-tls.yml, and an app that was using the TLS delegation. In this example it was called tls-delegation.yml

kubectl delete -f test-tls.yml
kubectl delete -f tls-delegation.yml

4. Now we created a new secret called contour-tls with the new cert. The <> indicates you need to replace that value with your specific information. The “<>” does not belong in the command.

kubectl create secret tls contour-tls -n projectcontour --cert=<wildcard.tanzu>.pem --key=<wildcard.tanzu>.key

5. Validate the pem values for .key and .crt match. The <> indicates you need to replace that value with your specific information. The “<>” does not belong in the command.


openssl pkey -in <PrivateKeyName>.key -pubout -outform pem | sha256sum

openssl x509 -in <CertificateName>.crt -pubkey -noout -outform pem | sha256sum

6. If the pem numbers match, the certificate is valid. Lets go ahead an re-create the tls-delegation.  Example command:

kubectl apply -f tls-delegation.yml

7. Now you should be good to go. After you deploy your app, you should be able to check the httpproxies again for Project Contour, and see that it has a valid status

kubectl get httpproxies.projectcontour.io

If all else fails, you can open a ticket with VMware GSS to troubleshoot further.

vSphere with Tanzu Validation and Testing of Network MTU

Blog Date: June 18, 2021
Tested on vSphere 7.0.1 Build 17327586
vSphere with Tanzu Standard


On a recent customer engagement, we ran into an issue where vSphere with Tanzu wasn’t successfully deploying. We had intermittent connectivity to the internal Tanzu landing page IP. What we were fighting ended up being inconsistent MTU values configured both on the VMware infrastructure side, and also in the customers network. One of the many prerequisites to a successful installation of vSphere with Tanzu, is having a consistent MTU value of 1600.


The Issue:


Tanzu was just deployed to an NSX-T backed cluster, however you are unable to connect to the vSphere with Tanzu landing page address to download Kubernetes CLI Package via wget.  Troubleshooting in NSX-T interface shows that the load balancer is up that has the control plane VMs connected to it.


Symptoms:

  • You can ping the site address IP of the vSphere with Tanzu landing page
  • You can also telnet to it over 443
  • Intermittent connectivity to the vSphere with Tanzu landing page
  • Intermittent TLS handshake errors
  • vmkping tests between host vteps is successful.
  • vmkping tests from hosts with large 1600+ packet to nsx edge node TEPs is unsuccessful.


The Cause:


Improper/inconsistent MTU settings in the network data path.  vSphere with Tanzu requires minimum MTU of 1600.   The MTU size must be 1600 or greater on any network that carries overlay traffic.  See VMware documentation here:   System Requirements for Setting Up vSphere with Tanzu with vSphere Networking and NSX Advanced Load Balancer (vmware.com)


vSphere with Tanzu Network MTU Validations:


These validations should have been completed prior to the deployment. However, in this case we were finding inconsistent MTU settings. So to simplify, these are what you need to look for.

  • In NSX-T, validate that the MTU on the tier-0 gateway is set to a minimum of 1600.
  • In NSX-T, validate that the MTU on the edge transport node profile is set to a minimum of 1600.
  • In NSX-T, validate that the MTU on the host uplink profile is set to a minimum of 1600.
  • In vSphere, validate that the MTU on the vSphere Distributed Switch (vDS) is set to a minimum of 1600.
  • In vSphere, validate that the MTU on the ESXi management interface (vmk0) is set to a minimum of 1600.
  • In vSphere, validate that the MTU on the vxlan interfaces on the hosts is set to a minimum of 1600.


Troubleshooting:

In the Tanzu enabled vSphere compute cluster, SSH into an ESXi host, and ping from the host’s vxlan interface to the edge TEP interface.  This can be found in NSX-T via: System, Fabric and select Nodes, edge transport nodes, and find the edges for Tanzu. The TEP interface IPs will be to the right.  In this lab, I only have the one edge. Production environments will have more.

  • In this example, vxlan was configured on vmk10 and vmk11 on the hosts. Your mileage may vary.
  • We are disabling fragmentation with (-d) so the packet will stay whole. We are using a packet size of 1600
# vmkping -I vmk10 -S vxlan -s 1600 -d <edge_TEP_IP>
# vmkping -I vmk11 -S vxlan -s 1600 -d <edge_TEP_IP>
  • If the ping is unsuccessful, we need to identify the size of the packet that can get through.  Try a packet size of 1572. If unsuccessful try 1500.  If unsuccessful try 1476. If unsuccessful try 1472, etc.

To test farther up the network stack, we can perform a ping something that has a different VLAN, subnet, and is on a routable network.  In this example, the vMotion network is on a different network that is routable. It has a different VLAN, subnet, and gateway.  We can use two ESXi hosts from the Tanzu enabled cluster.

  1. Open SSH sessions to ESXi-01 and ESXi-02.
  2. On ESXi-02, get the PortNum for the vMotion vmk. On the far left you will see the PortNum for the vMotion enabled vmk. Run the following command:
# net-stats -l

3. Run a packet capture on ESXi-02 like so:

# pktcap-uw --switchport <vMotion_vmk_PortNum> --proto 0x01 --dir 2 -o - | tcpdump-uw -enr -

4. On the ESXi-01 session, use the vmkping command to ping the vMotion interface of ESXi-02.  In this example we use a packet size of 1472 because that was the packet size the could get through, and option -d to prevent fragmentation.

# vmkping -I vmk0 -s 1472 -d <ESXi-02_vMotion_IP>

5. On the ESXi-02 session, we should now see six or more entries. Do a CTRL+C to cancel the packet capture.

6. Looking at the packet capture output on ESXi-02, We can see on the request line that ESXi-01 MAC address made a request to ESXi-02 MAC address.

  • On the next line for reply, we might see a new MAC address that is not ESXi-01 or ESXi-02.  If that’s the case, then give this MAC address to the Network team to troubleshoot further.



Testing:

Using the ESXi hosts in the Tanzu enabled vSphere compute cluster, we can ping from the host’s vxlan interface to the edge TEP interface.

  • The edge TEP interface can be found in NSX-T via: System, Fabric and select Nodes, edge transport nodes, and find the edges for Tanzu. The TEP interface IPs will be to the far right.
  • You will need to know what host vmks the vxlan is enabled. In this example we are using vmk10 and vmk11 again.

In this example we are using vmk10 and vmk11 again. We are disabling fragmentation with (-d) so the packet will stay whole. We are using a packet size of 1600. These should now be successful.

The commands will look something like:

# vmkping -I vmk10 -S vxlan -s 1600 -d <edge_TEP_IP>
# vmkping -I vmk11 -S vxlan -s 1600 -d <edge_TEP_IP>

On the ESXi-01 session, use the vmkping command to ping something on a different routable network, so that we can force traffic out of the vSphere environment and be routed. In this example just like before, we will be using the vMotion interface of ESXi-02. Packet size of 1600 should now work. We still use option -d to prevent fragmentation.

# vmkping -I vmk0 -s 1600 -d <ESXi-02_vMotion_IP>

On the developer VM, you should now be able to download the vsphere-plugin.zip from the vSphere with Tanzu landing page with the wget command.

# wget https://<cluster_ip>/wcp/plugin/linux-amd64/vsphere-plugin.zip

With those validations out of the way, you should now be able to carry on with the vSphere with Tanzu deployment.

Configuring vCloud Director 10 Part 2. Adding Provider Virtual Data Center.

This is a continuation of deploying VMware Cloud Director 10. In my last post, I walked through configuring the vSphere lookup service, and adding the vCenter (here). In this post I’ll go over adding a Provider Virtual Data Center (PVDC).

Adding a PVDC

Log into the vCD provider interface, and switch to the Cloud Resources view by clicking the menu to the right of vCloud Director logo. Select the Provider VDCs option in the menu on the left, and then “NEW” link to begin.

On page 1, you’ll have to fill in some general information about the PVDC. Give it a name and description meaningful to the resources the PVDC will be connected to. In this example, I am connecting to my home lab. Click NEXT.

On page 2, select the vCenter and click NEXT.

On page 3, you’ll see the available resources. This would be for both compute and storage. In this example I am using a lab, so I only have one available. Hardware compatibility is also configured here for the future tenants deployed to this PVDC. Click NEXT

On Page 4, the available storage policies configured in the vCenter that the tenants would use in this PVDC, will be available for selection here. Click NEXT.

On Page 5, your mileage may vary depending how your environment is configured. In my lab example, I have chosen the default selection. Click NEXT.

On page 6, you are presented with a confirmation of the selected config. Make any adjustments, and click FINISH.

Be patient as it can take some time to build the PVDC. Just monitor the recent tasks for task progress and completion. The end result should show a “Normal” for a status under the configured Provider VDCs.

At this point, the provider side configuration is almost complete. We still need to configure the public facing address. If this were a production deployment, we also find it necessary to configure a VIP/load balancer to place in front of the VCD appliances to handle traffic load. For production deployments there would also be the need to setup signed certificates for the appliances.

In my next blog I’ll go over configuring the public address.

VMworld General Session, Day 2, Tuesday August 27th, 2019: VMware Tanzu demos, and new CTO announcement!

Day 3 of VMworld 2019 in San Francisco is underway, and it is the second day of General sessions. Clearly today’s theme is Kubernetes, and VMware’s Ray O’Farrell kicked off the keynote by talking about VMware Tanzu and Tanzu’s mission control.

The Keynote then included the integration of NSX-T with Tanzu. The ability to test changes, to see the impact on the environment before going live, was truly amazing

There was also an interesting demo with VMware Horizon and Workspace ONE, showcasing the usage deploying work spaces rapidly from the cloud, and creating zero-trust security policy withing workspace ONE with Carbon Black

Pat jumped up on stage to announce that Ray O’Ferrell (@ray_ofarrell) would be leading VMware’s cloud native apps division, and Greg Lavender (@GregL_VMware) was named the New CTO of VMware.

VMware also announced a limited edition t-shirt that would be given away later that day. VMware had roughly 1000 of these shirts made up, and luckily I was able to get a shirt before they ran out.

Plenty of people were upset about not getting a shirt due to the limited run. Gives a whole new meaning to nerd rage…. (sorry I couldn’t help myself).

VMworld General Session, Day 1, Monday August 26th, 2019: VMware Tanzu, and Project Pacific.

The start of VMworld 2019 in San Francisco is underway, and Pat kicked off the general session talking about his excitement for being back in San Francisco, while poking fun at us “Vegas lovers”. Pat also talked about technology, our digital lives, and technologies role being a force for good. He talked about charities, and cancer research foundations.

Pat Then talked about The Law of Unintended Consequences, and how technology has advanced, we as a society have given up certain aspects of Privacy, the need to combat disinformation at scale available widely on the social media platforms.

Surprisingly, according to Pat, Bitcoin is Bad and contributes to the climate crisis.

First Major Announcement with Kubernetes, as VMware has been focussing on containers

Pat then announced the creation of VMware Tanzu, which is the initiative to have a common platform that allows developers to build modern apps, run enterprise Kubernetes, and platform to manage Kubernetes for developers and IT..

Second Major Announcement, Project Pacific. An ambitious project to unite vSphere and Kubernetes for the future of modern IT

Interestingly, Project Pacific was announced to be 30% faster than a traditional Linux VM, and 8% faster than solutions running on bare metal.

Project Pacific brings Kubernetes to the VMware Community, and will be offered by 20K+ Partner resellers, 4K+ Service providers and 1,100+ technology partners.

Tanzu also comes with mission control, a centralized tool allowing IT Operations to manage Kubernetes for developers and IT.

First Time Speaking At The St. Louis VMUG UserCon

Blog Date: July 21, 2012

The VMUG leadership invited me to speak at the St. Louis VMUG Usercon on April 18, 2019, and share my presentation on How VMware Home Labs Can Improve Your Professional Growth and Career.

This would be my second time giving a public presentation, but I left The Denver VMUG UserCon with a certain charge, or a spring in my step as it were. I didn’t have a lot of time to prepare or to change up my presentation, remembering that I have a PSO customer that I need to take care of. I arrived a day early for the speaker dinner that was being put on by the St. Louis VMUG leadership.

Prior to the dinner, I was able to explore the historical, and picturesque city of St. Charles.

The next day, we all converged on the convention center for the St. Louis UserCon. This way to success!

Seeing your name as a speaker amongst a list of people you’ve looked forward to meeting, have met, or follow on social media, certainly is humbling.

This time, my session was in the afternoon, so in true fashion of many public speakers in the #vCommunity, I had all day to make tweaks. I was also able to join a few sessions. Finally found my room in the maze of this convention center and got setup.

The ninja, and co-leader of the St. Louis UserCon, Jonathan Stewart (@virtuallyanadmi), managed to take a picture of me giving my presentation.

A special thank you to the St. Louis VMUG leadership team, who invited me out to meet and share with their community: Marc Crawford (@uber_tech_geek), Jonathan Stewart (@virtuallyanadmi) and Mike Masters (@vMikeMast)

First Time Speaking At The Denver UserCon

Blog date: July 9th, 2019

This post is a little late, considering the Denver VMUG UserCon was on April 9th, but alas I have been traveling a lot over the past few months

The Denver UserCon was my first time speaking at a public event in front of a large audience. Public speaking is something that I have thought about doing for a while now, and how fitting that my first event, be a year after my very first UserCon attendance, at the very same venue. If I am completely honest, as this was my first time, I was a little nervous, but like anything you just have to take that leap of faith.

My good friend Ariel Sanchez (@arielsanchezmor) has been encouraging me to start this journey, and because this will give me the skills I need for a future role as a marketing engineer, I decided this would be the year to get my feet wet. But what to present?

I’d like to think that most presenters have a blog, that they can reshape into a power point presentation, so I submitted two community sessions. One about vROps, and the other about VMware homelabs. We in the #vCommunity use home labs a lot. Not only our daily use, but to better ourselves professionally. Home labs give us a safe place to experiment with new VMware releases, plan upgrades, and just familiarize ourselves with other products that we may not have a chance to work with in a production environment.

As such, the VMUG community leadership selected my presentation on “How VMware Home labs How a VMware Home lab Can Accelerate Your Career”.

Given the amount of attendees who came to learn and support my first presentation, I’d say that the desire to learn and build a VMware home lab is strong.

I’m not going to lie, as this was my first time, I was certainly nervous. Breath…..count to five……..jump. The presentation was well received, and my friends who joined in support, gave me some positive feedback, along with constructive feedback for improvement.

The evening was capped off with a nice dinner with friends from the #vCommunity.

Dinner with @vGonzilla @vCenterNerd @hcsherwin @crystal_lowe @scott_lowe and @arielsanchezmor

A special thank you to the Denver VMUG leadership team, who invited me out to meet and share with the community: Jason Valentine (@JasonV_VCP5), Tony Gonzalez (@vGonzilla) and Scott Seifert (@vScottSeifert)

Upgrading To vSphere 6.7 Update 1, and Using The vCenter Converge Tool: Part 2

In this second part of the blog series “Upgrading to vSphere 6.7 Update 1, and Using the vCenter Converge Tool”, I will go over my experience using the Convergence tool. Lets get started.

The Basics of the vCenter Converge Tool

David Stamen (@davidstamen) put together an excellent blog on Understanding the vCenter Server Converge Tool, at VMware’s offical blog site, which I found very useful. Shout-out to Nigel Hickey (@vCenterNerd) for answering some questions I had.

The Convergence Tool basically takes the external PSC and embeds it into the vCenter appliance like so:

Photo credit
@davidstamen

For this customer, I had three vCSA’s and three PSC’s that I needed to converge. Most of the blogs that I found didn’t cover PSC’s that were joined to a domain, environments with multiple vCenters, or with multiple PSC’s, so I thought I would write this up in a blog.

Planning the Convergence

The first thing I had to do was take note of any registered services with the SSO domain. I utilized VMware’s KB2043509 to identify these services, which I had none to worry about. VMware specifically calls out NSX and Site Recovery Manager (SRM), but since those were not in use at this customer, the only things I had to worry about were Horizon, vROps, vRLi and Zerto. Each of these services registered directly to the vCenters, so I had nothing to worry about there. If I had any services registered with the SSO domain, I’d simply need to re-register them once the convergence tool was ran. But since this didn’t apply, I can move forward with configuring the scripts for the convergence tool.

I also need to have an understanding of the replication typology of the existing SSO domain. VMware KB2127057 was an excellent resource I used to gather that information. Opening a putting session to a vCenter, and running the ‘vdcrepadmin’ command against each of the external PSCs, I was able to see the following:

# cd /usr/lib/vmware-vmdir/bin

./vdcrepadmin -f showpartners -h external_psc-a.domain.com -u administrator -w kjdshfsdkjfhskjdhf

ldap://external_psc-b.domain.com
ldap://external_psc-c.domain.com

-----------------------------------------------------------------

./vdcrepadmin -f showpartners -h external_psc-b.domain.com -u administrator -w kjdshfsdkjfhskjdhf

ldap://external_psc-a.domain.com
ldap://external_psc-c.domain.com

-----------------------------------------------------------------

./vdcrepadmin -f showpartners -h external_psc-c.domain.com -u administrator -w kjdshfsdkjfhskjdhf

ldap://external_psc-a.domain.com
ldap://external_psc-b.domain.com

I can see they already have a ring topology, which is the desired architecture. If I were to draw the SSO typology out, it would look something like:

Setting Up the JSON Templates for the Convergence Tool

The converge.json template that the convergence tool uses, can be found in the VMware VCSA ISO, that was used for the 6.7 Update 1 upgrade, under the following path: DVD Drive (#):\VMware VCSA\vcsa-converge-cli\templates\

To make my life easier, I copied the contents of the entire ISO to a folder on the root of my C drive. I then made a seperate folder on the root of C called converge, and created a folder for each of the three vCenters I’d be working with: vCenter-A, vCenter-B, vCenter-C. I made a copy of the converge.json, and placed it into each folder.

Taking a look at the converge.json for vCenter-A, the template tells you what data needs to be filled in, so pay close attention. Lines 10 – 15 needs entries for the ESXi host where the vCenter resides, or the managing vCenter appliance. Here I chose the option to used the Managing ESXi host. All I needed to do, was look in vSphere to see where the vCSA appliance VM resided on which host. While there, I also set the Cluster DRS settings to manual, to prevent the VMs from moving during the upgrade. Once I obtained the information needed, I completed that portion of the json. (I’ve redacted environment specific information).

Lines 16 – 21 need data entries for the first vCenter appliance (vCenter-A) to be converged. Here I need the FQDN for vCenter-A, for the Username, I need the administrator@vsphere.local account, its password, and the root password of the appliance.

Lines 22 – 33 would be filled out IF the Platform Services Controller (PSC) appliance is joined to the domain. My customer was joined to the domain, so I needed to fill this section out. Otherwise you can remove this section from the JSON.

Now, because this is the first vCenter of three, in the same SSO domain, for the first convergence, I did not need this section, because the first vCenter does not have a partner yet. It will be needed however, on the second (vCenter-B) and third (vCenter-C) convergences.

Now I need to fill out a second and third converge.json file for the second and third convergence, saving each in its respective folder. For vCenter-B and vCenter-C, for the partner hostname on line 32, I used the FQDN of the first converged vCenter (vCenter-A), as that is the first partner of the SSO domain.

For vCenter-A, the first to be converged, the completed converge.json looks like this (take note of the commas, brackets and lines removed):

For the second convergence (vCenter-B), and third convergence (vCenter-C), the completed converge.json looks like this:

Now that we got the converge.json done for each of the vCenters, we can work on the decommission.json.

Here is the template VMware provides in the same directory:

Lines 11 – 15 require impute for the Managing vCenter or ESXi Host of the External PSC. Again, just like the vCenter, I used the ESXi host that the PSC is running on.

Lines 16 – 21 needs data for the Platform Services Controller that will be Decommissioned.

Lines 30 – 34 requires information for the vCenter the PSC was paired with. Again here I just used the ESXi host that the vCenter is currently running on

Lines 35 – 39 require the information for the vCenter, the PSC is paired with.

Now that we have the decommission.json filled out for the first vCenter (vCenter-A), I have to repeat the process for the second and third vCenters (vCenter-B, vCenter-C). The full decommission.json should look like

Now that both the converge.json and decommission.json have been filled out for each of my environments (3), and stored in the same directory on the root of C, I can move forward with the Convergence process.

Prerequisites and Considerations Before Starting the Convergence Process

  • The converge tool only supports the VCSA and PSC 6.7 Update 1. All nodes must be on 6.7 Update 1 before converting.
  • If you are currently running a Windows vCenter Server or PSC, you must migrate to the appliance first.
  • Before converting, take a backups of your VCSA(s) and PSCs in the vSphere SSO domain(VM snapshots, and DB backups).
  • Know all other solutions using the PSC for authentication in the environment. They will need to be re-registered after the convergence completes and before decommissioning.
  • A machine on a routable network which can communicate with the VCSA and PSC will be used to run the convergence and decommission process.
  • Set the DRS Automation Level to manual, and the Migration Threshold to conservative. There will be be issues if the VCSA being converged is moved during the process.
  • If VCHA is enabled, it must be disabled prior to running the convergence process.
  • The converge process will handle PSC HA load balancers. Make sure you point to the VIP in the JSON template if you have them.
  • All vSphere SSO data is migrated with the exception of local OS users.
  • Best to take snapshots of the vCSA and external PSC VMs before continuing. We’ve already backed up the database, but it doesn’t hurt to have snapshots as well.

Executing the Converge Tool

Now that converge.json template for each vCenter (vCenter-A, vCenter-B, vCenter-C) is filled out properly, we can now execute. We will run the convergence tool against the first vCenter (vCenter-A). Note: We can only run the converge tool against one vCSA at a time.

In powershell, we can first run the following command before proceeding with the upgrade to see what options/parameters are available with the converge tool.

.\vcsa-converge-cli\win32\vcsa-util.exe converge --help 

To execute the convergence tool against the first vCenter (vCenter-A), I ran the following command:

.\vcsa-converge-cli\win32\vcsa-util.exe converge --no-ssl-certificate-verification --backup-taken C:\pathtofile.json

The output in powershell should look something like:

It will then ask you to reboot the first vCenter before continuing.

Once the first vCenter (vCenter-A) came up, I executed the convergence tool for the second vCenter (vCenter-B). Once completed I restarted the appliance.

Finally, the last vCenter (vCenter-C) is on deck. I executed the converge.json against that vCenter, and once completed, I restarted it.

Here is where you would need to re-point those systems using the old SSO domain, but since I didn’t have any, I can move forward with the decommissioning steps.

Decommissioning the Old external Platform Services Controllers (PSC)

Using the Converge Tool with the decommission option to remove the external PSC’s. Just like before, we need to do this one PSC at a time. The command looks something like this:

 .\vcsa-converge-cli\win32\vcsa-util.exe decommission --no-ssl-certificate-verification C:\pathtofile.json 

Once the process successfully completes, move onto the next PSC. Repeat the process until all PSC’s have been decommissioned.

Validate the SSO Replication Topology After the Converge Process

If you’ll remember, when I setup the converge.json, I had the second vCenter (vCenter-B) and third vCenter (vCenter-C) replication partner set to the first converged vCenter (vCenter-A). My Replication topology currently looks like this:

I needed to close the loop between vCenter-B and vCenter-C. Using VMware’s KB2127057 , I used the ‘createagreement’ parameter. I opened a putty session to vCenter-B and ran the following command:

# cd /usr/lib/vmware-vmdir/bin

./vdcrepadmin -f createagreement -2 -h vcenter-b.domain.com -H vcenter-c.domain.com -u Administrator -w VMw@re123

Now that the SSO replication agreement has been made between vCenter-B and vCenter-C, my replication topology looks like this:

I’m not going to lie, the hardest part of using the convergence tool, was just getting started. I’ve been through enough fires in my day to know how bad of a time I would have had if something went wrong, and I lost either the vCenter, or external PSC before the convergence successfully completed. Once I got myself beyond that mental hurdle, the process was actually quite easy and smooth.

I know I’ve left this customer’s environment in a lot better shape than I found it, and having embedded PSCs will make future vCenter upgrades a breeze. For a VMware PSO consultant, this was a huge value add for the customer.

Blog Date: April 16, 2019

Upgrading To vSphere 6.7 Update 1, and Using The vCenter Converge Tool: Part 1

I recently wrapped up a vSphere 6.7 U1 upgrade project, while on a VMware Professional Services (PSO) engagement, with a customer in Denver Colorado. On this project, I had to upgrade their three VMware environments from 6.5, to 6.7 Update 1. This customer also had three external Platform Services Controllers (PSC), a configuration that is now depreciated in VMware architecture.

Check the VMware Interoperability, and Compatibility Matrices

The first thing I needed to do, was to take inventory of the customer’s environment. I needed to know how many vCenters, if they had external Platform Services Controllers, how many hosts, vSphere Distributed Switch (VDS), and what the versions were.

  • From my investigation, this customer had three vCenters, and three external platform services controllers (PSC), all apart of the same SSO domain.
  • I also made note of which vCenter was paired with what external PSO. This information is critical not only for the vSphere 6.7 U1 upgrade, but also the convergence process that I will be doing in part two of this blog series.
  • Looking at the customer’s ESXi hosts, the majority were running the same ESXi 6.5 build, but I did find a few Nutanix clusters, and six ESXi hosts still on version 5.5.
  • The customer had multiple vSphere Distributed Switch (VDS) that needed to be upgraded to 6.5 before the 6.7 upgrade.

The second thing that I needed to do was to look at the model of each ESXi host and determine if it is supported for the vSphere 6.7 U1 upgrade. I also need to validate the firmware and BIOS each host is using, to see if I need to have the customer upgrade the firmware and BIOS of the hosts. We’ll plug this information into the VMware Compatibility Guide .

  • From my investigation, the three ESXi hosts running ESXi 5.5 were not compatible with 6.7U1, however they were compatible with the current build of ESXi 6.5 the customer was running on their other hosts. I would need to upgrade these hosts to ESXi 6.5 before starting the vSphere 6.7 U1 upgrade.
  • This customer had a mix of Dell and Cisco UCS hosts, and almost all needed to have their firmware and BIOS upgraded to be compatible with ESXi 6.7 U1.

The third thing I needed to check was to see what other platforms, owned by VMware, and/or bolt on third parties, that I needed to worry about for this upgrade.

  • The customer is using a later version of VMware’s Horizon solution, and luckily for me, it is compatible with vSphere 6.7 U1, so no worries there.
  • The customer has Zerto 6.0 deployed, and unfortunately it needed to be upgraded to a compatible version.
  • The customer has Actifio backup solution, but that is also running a compatible version, so again no need to update it.

Lets Get those ESXi 5.5 hosts Upgraded to 6.5

I needed to schedule an outage with the customer, as they had three offsite locations, with two ESXi 5.5 hosts each. These hosts were using local storage to house and run the VMs, so even though they were in a host cluster, HA was not an option, and the VMs would need to be powered off.

Once I had the outage secured, I was able to move forward with upgrading these six hosts to ESXi 6.5.

Time to Upgrade the vSphere Distributed Switch (VDS)

For this portion of the upgrade, I only needed to upgrade the customers VDS’s to 6.5. This portion of the upgrade was fast, and I was able to do it mid day without the customer experiencing an outage. We did submit a formal maintenance request for visibility, and CYA. Total upgrade time to do all of their VDS’s was less than 15 minutes. Each switch took roughly a minute.

Upgrade the External Platform Services Controllers Before the vCenter Appliances

Now that I had all hosts to a compatible ESXi 6.5 version, I can move forward with the upgrade. I was able to do this upgrade during the day, as the customer would only lose the ability to manage their VMs using the vCenters. I made backups of the PSC and vCSA databases, and created snapshots of the VMs just in case.

I first needed to upgrade the external PSCs (3) to 6.7 U1, so I simply attached the vCSA.iso to my jump VM, and launched the .exe. I did this process one PSC at a time until they were all upgraded to 6.7 U1.

Upgrade the vCenter Appliances to 6.7 Update 1

Now that the external platform services controllers are on 6.7 U1, it is time to upgrade the vCenters. The process is the same with the exe, so I just did one vCenter at a time. Both the external PSC’s and the vCSA’s upgraded without issue, and within a couple of hours both the external PSC’s and vCSA’s had finished the vSphere 6.7 Update 1.

Upgrade Compatible ESXi Hosts to 6.7 Update 1

I really wanted to use the now embedded VMware Update Manager (VUM), but I either faced users who re-attached ISO’s to their Horizon VMs, or had administrators who were upgrading/installing VMware Tools. In one cluster I even happened to find a host that had improper networking configured compared to its peers in the cluster. Once I got all of that out of the way, I was able to schedule VUM to work its way down through each cluster, and upgrade the ESXi hosts to 6.7 Update 1. There were still some fringe cases where VUM wouldn’t work as intended, and I needed to do one host at a time.

Conclusion for the Upgrade

In the end, upgrading the customer’s three environments, vCSA, PSC and ESXi to 6.7 Update 1 took me about a couple of weeks to do alone. Not too shabby considering I finished ahead of schedule, even with all of the issues I faced. After the upgrade, the customer started having their Cisco UCS blades purple screen at random. After opening a case with GSS, that week Cisco came out with an emergency patch for the fnic driver, on the customer’s UCS blades, for the very issue they were facing. The customer was able to quickly patch the blades. Talk about good timing.

Part 2 Incoming

Part 2 of this series will focus on using the vCenter Converge Tool. Stay tuned.

Blog Date: 4/15/2019

2019 VMware vExpert Announcement

It’s that time of year again. I’m honored and humbled to continue to be apart of the VMware vExpert program. This program challenges me every day to continue to learn, and contribute to the #vCommunity. For me, this isn’t just some title. This is a family of community warriors where we learn from and help each other grow. Everyone in some way gives back to the community. This year, I am also excited to try my hand at public speaking, and give back to the VMUG community as a community session speaker. I don’t think that I would have had the courage to apply to be a speaker, if it wasn’t for my fellow vExperts encouraging me to do so.

Congrats to all the new and returning vExperts! https://blogs.vmware.com/vexpert/2019/03/07/vexpert-2019-award-announcement/