Kubernetes API Server Endpoint FQDN Missing from Certificate SAN in vSphere with Tanzu Deployment.

Blog Date: June 17, 2021
Tested on vSphere 7.0.1 Build 17327586
vSphere with Tanzu Standard

The Issue:

On a recent customer engagement we ran into an issue when we applied the CA signed certificate to the vSphere with Tanzu enabled cluster. The customer could reach the Tanzu Landing Page (internal Kubernetes site address) with the assigned IP, and they received the secure lock on the site. However, they received an invalid certificate warning when trying to connect to the internal Kubernetes site with the FQDN.  Upon closer inspection we realized that the FQDN is not apart of the certificate Subject Alternative Name (SAN). We had also found that this customer had MTU inconsistencies in their environment, and we ended up redeploying vSphere with Tanzu a couple of times. There will be another blog post for that, but in regards to this blog, on the last deployment we missed adding the FQDN during the setup.

The Cause:

When enabling vSphere with Tanzu on a compute cluster, during the deployment wizard, you are asked for the API Server endpoint FQDN (fully-qualified domain name).  You will notice this says “Optional”. 

However, because this value was not filled out during the deployment, it will not be in the SAN when you create a CSR to apply a certificate to the Tanzu supervisor cluster. 


Currently the only “easy” fix for this would be to redeploy vSphere with Tanzu on the cluster assuming you are early on in the deployment. 

However, if you have already deployed workloads this will be destructive.  Your only other option is to open a ticket with VMware GSS, and they will need to add the missing entry to the database on the vCenter.

I wouldn’t expect there to be a public KB article on this as we do not want customers editing the vCenter database without GSS guidance.  There is currently no way to add the missing API Server endpoint FQDN in the UI.  As of 6/16/2021, I heard an unconfirmed rumor that a feature request has been added for this, and there will be an option to edit this in the UI. However, there’s currently no ETA on when it will be added. 

A big shout-out to Nicholas M. in GSS for helping me to resolve. Even though I cannot share the full resolution here, I can at least help others troubleshoot.

Advanced Troubleshooting:

  1. If you really need to confirm that this is your issue, we can open a putty session to the vCenter.

2. Next we can check the database to to find what MasterDNSName was entered during the time of deployment.  In my test, I only have a single compute cluster that has vSphere with Tanzu enabled.  Your mileage may vary if you have more than one cluster enabled.  Enter the following command to view the table.  We are not making changes here. 

# PGPASSFILE=/etc/vmware/wcp/.pgpass psql -U wcpuser -d VCDB -h localhost -x -c "select desired_config from cluster_db_configs where cluster like 'domain-c%';" | less

3. Initially you will see a bunch of lines on your console. If you hit the “page down” key once or twice to get past these lines (if needed, lowercase g to go back to the top).  Look for MasterDNSNames. This would be the API Server endpoint FQDN.  If the value = null, the API Server endpoint FQDN was left blank during the setup. 

You cannot edit this. This however will confirm that the api server endpoint FQDN was not entered during the initial deployment.

5. hit q to quit

As stated previously, if this is a fresh deployment, the easiest path forward would be to re-deploy vSphere with Tanzu on the compute cluster in vSphere.

If you already have running workloads, your only other option at this point would be to open a support request with VMware GSS.