vSphere with Tanzu: Project Contour TLS Delegation Invalid – Secret Not Found

Blog Date: June 25, 2021
Tested on vSphere 7.0.1 Build 17327586
vSphere with Tanzu Standard

On a recent customer engagement, we ran into an issue where after we deployed project Contour, and created a TLS delegation “contour-tls”, but we ran into an issue where Contour did not like the public wildcard certificate we provided. We were getting an error message “TLS Secret “projectcontour/contour-tls” is invalid: Secret not found.”

After an intensive investigation to make sure everything in the environment was sound, we came to the conclusion that the “is invalid” part of the error message suggested that there was something wrong with the certificate. After working with the customer we discovered that they included the Root, the certificate, and the intermediate authorities in the PEM file. The root doesn’t need to be in the pem.  Just the certificate, and the intermediate authorities in descending order. Apparently that root being in the pem file made Contour barf. Who knew?

You could possibly see that the certificate is the issue by checking the pem data for both the <PrivateKeyName>.key and the <CertificateName>.crt by running the following commands, and comparing the pem output. IF it doesn’t match this could be your issue as well. The “<>” should be updated with your values, and don’t include these “<” “>”.

openssl pkey -in <PrivateKeyName>.key -pubout -outform pem | sha256sum
openssl x509 -in <CertificateName>.crt -pubkey -noout -outform pem | sha256sum

Below are the troubleshooting steps we took, and what we did to resolve the issue. We were using Linux, and had been logged into vSphere with Tanzu already. Did I mention that I hate certificates? But I digress….

The Issue:

You had just deployed a TKG cluster, and then deployed Project Contour as the ingress controller that uses a load balancer to be the single point of entry for all external users. This connection terminates SSL connections, and you have applied a public wildcard certificate to it. You created the TLS secret, and have created the TLS delegation, so that new apps deployed to this TKG cluster can delegate TLS connection terminations to contour. However, after you deployed your test app to verify the TLS delegation is working, you see a status of “Invalid. At least one error present, see errors for details.”, after running the following command:


kubectl get httpproxies

Troubleshooting:

  1. You run the following command to gather more information, and see in the error message: “Secret not found” Reason: “SecretNotValid
kubectl describe httpproxies.projectcontour.io

2. You check to make sure the TLS Secret was created in the right namespace with the following command, and you see that it is apart of the desired namespace. In this example, our namespace was called projectcontour, and the TLS secret was called contour-tls.


kubectl get secrets -A

3. You check the TLS delegation to make sure it was created with the following command. In this example ours was called contour-delegation, and our namespace is projectcontour.

kubectl get tlscertificatedelegations -A

4. You look at the contents of the tlscertificatedelegations with the following command, and nothing looks out of the ordinary.

kubectl describe tlscertificatedelegations -A

5. You check to see the secrets of the namespace with the following command. In this example our namespace is called projectcontour and we can see our TLS delegation contour-tls.


kubectl get secrets --namespace projectcontour

6. You validate contour-tls has data in it with the following command. In this example, our namespace is projectcontour and our TLS is contour-tls.

kubectl get secrets --namespace projectcontour contour-tls -o yaml

In the yaml output, up at the top you should see tls.crt: with data after

Down towards the bottom of the yaml output, you should see tls.key with data after

Conclusion: Everything looks proper on the Tanzu side. Based on the error message we saw “TLS Secret “projectcontour/contour-tls” is invalid: Secret not found.” The “is invalid” part could suggest that there is something wrong with the contents of the TLS secret.  At this point, the only other possibility would be that the public certificate has something wrong and needs to be re-generated.   The root doesn’t need to be in the pem.  Just the certificate for the site, and intermediate authorities in descending order.

The Resolution:

  1. Create and upload the new public certificate for contour to vSphere with Tanzu.
  2. We’ll need to delete the secret and re-create it.  Our secret was called “contour-tls”, and the namespace was called “projectcontour”.
kubectl delete secrets contour-tls -n projectcontour

3. We needed to clean our room, and delete the httpproxies we created in our test called test-tls.yml, and an app that was using the TLS delegation. In this example it was called tls-delegation.yml

kubectl delete -f test-tls.yml
kubectl delete -f tls-delegation.yml

4. Now we created a new secret called contour-tls with the new cert. The <> indicates you need to replace that value with your specific information. The “<>” does not belong in the command.

kubectl create secret tls contour-tls -n projectcontour --cert=<wildcard.tanzu>.pem --key=<wildcard.tanzu>.key

5. Validate the pem values for .key and .crt match. The <> indicates you need to replace that value with your specific information. The “<>” does not belong in the command.


openssl pkey -in <PrivateKeyName>.key -pubout -outform pem | sha256sum

openssl x509 -in <CertificateName>.crt -pubkey -noout -outform pem | sha256sum

6. If the pem numbers match, the certificate is valid. Lets go ahead an re-create the tls-delegation.  Example command:

kubectl apply -f tls-delegation.yml

7. Now you should be good to go. After you deploy your app, you should be able to check the httpproxies again for Project Contour, and see that it has a valid status

kubectl get httpproxies.projectcontour.io

If all else fails, you can open a ticket with VMware GSS to troubleshoot further.

vSphere with Tanzu Validation and Testing of Network MTU

Blog Date: June 18, 2021
Tested on vSphere 7.0.1 Build 17327586
vSphere with Tanzu Standard


On a recent customer engagement, we ran into an issue where vSphere with Tanzu wasn’t successfully deploying. We had intermittent connectivity to the internal Tanzu landing page IP. What we were fighting ended up being inconsistent MTU values configured both on the VMware infrastructure side, and also in the customers network. One of the many prerequisites to a successful installation of vSphere with Tanzu, is having a consistent MTU value of 1600.


The Issue:


Tanzu was just deployed to an NSX-T backed cluster, however you are unable to connect to the vSphere with Tanzu landing page address to download Kubernetes CLI Package via wget.  Troubleshooting in NSX-T interface shows that the load balancer is up that has the control plane VMs connected to it.


Symptoms:

  • You can ping the site address IP of the vSphere with Tanzu landing page
  • You can also telnet to it over 443
  • Intermittent connectivity to the vSphere with Tanzu landing page
  • Intermittent TLS handshake errors
  • vmkping tests between host vteps is successful.
  • vmkping tests from hosts with large 1600+ packet to nsx edge node TEPs is unsuccessful.


The Cause:


Improper/inconsistent MTU settings in the network data path.  vSphere with Tanzu requires minimum MTU of 1600.   The MTU size must be 1600 or greater on any network that carries overlay traffic.  See VMware documentation here:   System Requirements for Setting Up vSphere with Tanzu with vSphere Networking and NSX Advanced Load Balancer (vmware.com)


vSphere with Tanzu Network MTU Validations:


These validations should have been completed prior to the deployment. However, in this case we were finding inconsistent MTU settings. So to simplify, these are what you need to look for.

  • In NSX-T, validate that the MTU on the tier-0 gateway is set to a minimum of 1600.
  • In NSX-T, validate that the MTU on the edge transport node profile is set to a minimum of 1600.
  • In NSX-T, validate that the MTU on the host uplink profile is set to a minimum of 1600.
  • In vSphere, validate that the MTU on the vSphere Distributed Switch (vDS) is set to a minimum of 1600.
  • In vSphere, validate that the MTU on the ESXi management interface (vmk0) is set to a minimum of 1600.
  • In vSphere, validate that the MTU on the vxlan interfaces on the hosts is set to a minimum of 1600.


Troubleshooting:

In the Tanzu enabled vSphere compute cluster, SSH into an ESXi host, and ping from the host’s vxlan interface to the edge TEP interface.  This can be found in NSX-T via: System, Fabric and select Nodes, edge transport nodes, and find the edges for Tanzu. The TEP interface IPs will be to the right.  In this lab, I only have the one edge. Production environments will have more.

  • In this example, vxlan was configured on vmk10 and vmk11 on the hosts. Your mileage may vary.
  • We are disabling fragmentation with (-d) so the packet will stay whole. We are using a packet size of 1600
# vmkping -I vmk10 -S vxlan -s 1600 -d <edge_TEP_IP>
# vmkping -I vmk11 -S vxlan -s 1600 -d <edge_TEP_IP>
  • If the ping is unsuccessful, we need to identify the size of the packet that can get through.  Try a packet size of 1572. If unsuccessful try 1500.  If unsuccessful try 1476. If unsuccessful try 1472, etc.

To test farther up the network stack, we can perform a ping something that has a different VLAN, subnet, and is on a routable network.  In this example, the vMotion network is on a different network that is routable. It has a different VLAN, subnet, and gateway.  We can use two ESXi hosts from the Tanzu enabled cluster.

  1. Open SSH sessions to ESXi-01 and ESXi-02.
  2. On ESXi-02, get the PortNum for the vMotion vmk. On the far left you will see the PortNum for the vMotion enabled vmk. Run the following command:
# net-stats -l

3. Run a packet capture on ESXi-02 like so:

# pktcap-uw --switchport <vMotion_vmk_PortNum> --proto 0x01 --dir 2 -o - | tcpdump-uw -enr -

4. On the ESXi-01 session, use the vmkping command to ping the vMotion interface of ESXi-02.  In this example we use a packet size of 1472 because that was the packet size the could get through, and option -d to prevent fragmentation.

# vmkping -I vmk0 -s 1472 -d <ESXi-02_vMotion_IP>

5. On the ESXi-02 session, we should now see six or more entries. Do a CTRL+C to cancel the packet capture.

6. Looking at the packet capture output on ESXi-02, We can see on the request line that ESXi-01 MAC address made a request to ESXi-02 MAC address.

  • On the next line for reply, we might see a new MAC address that is not ESXi-01 or ESXi-02.  If that’s the case, then give this MAC address to the Network team to troubleshoot further.



Testing:

Using the ESXi hosts in the Tanzu enabled vSphere compute cluster, we can ping from the host’s vxlan interface to the edge TEP interface.

  • The edge TEP interface can be found in NSX-T via: System, Fabric and select Nodes, edge transport nodes, and find the edges for Tanzu. The TEP interface IPs will be to the far right.
  • You will need to know what host vmks the vxlan is enabled. In this example we are using vmk10 and vmk11 again.

In this example we are using vmk10 and vmk11 again. We are disabling fragmentation with (-d) so the packet will stay whole. We are using a packet size of 1600. These should now be successful.

The commands will look something like:

# vmkping -I vmk10 -S vxlan -s 1600 -d <edge_TEP_IP>
# vmkping -I vmk11 -S vxlan -s 1600 -d <edge_TEP_IP>

On the ESXi-01 session, use the vmkping command to ping something on a different routable network, so that we can force traffic out of the vSphere environment and be routed. In this example just like before, we will be using the vMotion interface of ESXi-02. Packet size of 1600 should now work. We still use option -d to prevent fragmentation.

# vmkping -I vmk0 -s 1600 -d <ESXi-02_vMotion_IP>

On the developer VM, you should now be able to download the vsphere-plugin.zip from the vSphere with Tanzu landing page with the wget command.

# wget https://<cluster_ip>/wcp/plugin/linux-amd64/vsphere-plugin.zip

With those validations out of the way, you should now be able to carry on with the vSphere with Tanzu deployment.

Collect Windows/Linux Virtual Server System Logs Using vRealize Log Insight

Blog Date: 06/16/2021
vRealize Log Insight 8.3

I recently had a customer engagement with Log Insight, and not only did they want to use it as the main log collector for their infrastructure, but they also wanted it to collect logs from their Windows virtual servers. Good news! There is a content pack called “Microsoft – Windows” that should be installed.  This contains a configuration template for windows servers.  This is used to create a group, so that every time a windows box has the agent installed, the agent picks up the settings from the server and forwards its logs.  Otherwise, we must configure each agent ini file manually which is not ideal.    There is also a Linux content pack on the market place that can be setup as well. This blog will focus on the Windows content pack, but the steps for Linux is very similar.

The ” Microsoft – Windows” content pack can be found in the Marketplace in Log Insight located on the Content Packs tab.

Once this is installed, go back to the Administration tab, and click on agents in the left column.  Click the down carrot next to All Agents, and find “Microsoft – Windows” in the list.  To the right of it, click the double box icon to copy the template.

Name the group, and click copy.

Now you configure the filter to find the windows server.  In my example, I chose “OS” “Matches” “Microsoft Windows Server 2016 Datacenter”. Click Save New Group button below.

Now that we have a Windows group defined for the agents, go ahead and install the agent on the Windows Server, and now it will have a proper configuration and begin forwarding its logs.  If the box already has the agent installed, you might need to restart the agent, or reinstall it. 

Likewise, there is also a Linux content pack on the market place that can be setup as well.  For either, we don’t have to create one group to rule them all.  You can get creative with your group filters, and have specific groups for specific server functions. 

VMware

Configuring vCloud Director 10 Part 1. Adding vSphere lookup Service and vCenter.

This is a continuation of deploying VMware Cloud Director 10. In my last post, I walked through deploying additional appliances for database high availability (here). Today we will add the vSphere lookup service and the vCenter.

Configuring the vSphere Lookup Service

Log into the vCD provider interface, and switch to the Administrator view by clicking the menu to the right of vCloud Director logo. Then under Settings on the right select vSphere Services.

After clicking the “REGISTER” link in the upper right, we’ll be able to add the lookup service URL link for the vCenter we’ll connect to a little later. Registering with vSphere Lookup Service enables vCloud Director to discover other services, and allows other services to discover this instance of vCloud Director.

Once all of your information is added, click the Register button. Watch the task for successful completion below.

Adding the vCenter

Click on the menu again to the right of vCloud Director logo, and select the vSphere Resources. On the top menu option to the right “vCenters”, click the Add button.

On page 1 of the wizard, fill in the required information for the vCenter. We already configured the lookup service, so we can leave that option selected here to provide the URL. Click next.

On page 2, add the information for the NSX-V appliance. Click Next.

Click finish, and then wait for the vCenter to connect and show a status of Normal.

Now that we have the vCenter connected, we can proceed to setting up and configuring the Provider Virtual Data Center (PVDC). This is required in order to make the vCenter resources such as compute and storage available to tenants. I’ll go over configuring the PVDC in the next blog.

Get and set NTP settings of all VMware Hosts with PowerCLI

Quiet often when I connect to customer sites to conduct a health check, I sometimes find their hosts having different NTP settings, or not having NTP configured at all. Probably one of the easiest one-liners I keep in my virtual Rolodex of powerCLI one-liners, is the ability to check NTP settings of all hosts in the environment.

Checking NTP with Powercli

Get-VMHost | Sort-Object Name | Select-Object Name, @{N=”Cluster”;E={$_ | Get-Cluster}}, @{N=”Datacenter”;E={$_ | Get-Datacenter}}, @{N=“NTPServiceRunning“;E={($_ | Get-VmHostService | Where-Object {$_.key-eq “ntpd“}).Running}}, @{N=“StartupPolicy“;E={($_ | Get-VmHostService | Where-Object {$_.key-eq “ntpd“}).Policy}}, @{N=“NTPServers“;E={$_ | Get-VMHostNtpServer}}, @{N="Date&Time";E={(get-view $_.ExtensionData.configManager.DateTimeSystem).QueryDateTime()}} | format-table -autosize

With this rather lengthy command, I can get everything that is important to me.

We can see from the output that I have a single host in my Dev-Cluster that does not have NTP configured. Quiet often I find customers that have mis-configured NTP settings, do not make use of host profiles that can catch and address issues like this.

If you also wanted to see the incoming, outgoing and protocols settings, you could use the following:

Get-VMHost | Sort-Object Name | Select-Object Name, @{N=”Cluster”;E={$_ | Get-Cluster}}, @{N=”Datacenter”;E={$_ | Get-Datacenter}}, @{N=“NTPServiceRunning“;E={($_ | Get-VmHostService | Where-Object {$_.key-eq “ntpd“}).Running}}, @{N="IncomingPorts";E={($_ | Get-VMHostFirewallException | Where-Object {$_.Name -eq "NTP client"}).IncomingPorts}}, @{N="OutgoingPorts";E={($_ | Get-VMHostFirewallException | Where-Object {$_.Name -eq "NTP client"}).OutgoingPorts}}, @{N="Protocols";E={($_ | Get-VMHostFirewallException | Where-Object {$_.Name -eq "NTP client"}).Protocols}}, @{N=“StartupPolicy“;E={($_ | Get-VmHostService | Where-Object {$_.key-eq “ntpd“}).Policy}}, @{N=“NTPServers“;E={$_ | Get-VMHostNtpServer}}, @{N="Date&Time";E={(get-view $_.ExtensionData.configManager.DateTimeSystem).QueryDateTime()}} | format-table -autosize

Set NTP with Powercli

You can setup NTP on hosts with powercli as well. I have everything in my lab pointed to ‘pool.ntp.org’, so my example will use that.

code snippet

Get-VMHost | Add-VMHostNtpServer -NtpServer pool.ntp.org

Most likely you would have multiple corporate NTP servers you’d need to point to, and that is easily done by separating them with a comma. An example of having two: Instead of having just ‘pool.ntp.org’ I’d have ‘ntp-server01,ntp-server02’.

The next thing needed is the startup policy. VMware has three different options to choose from. on = Start and stop with host, automatic = start and stop with port usage, and off = start and stop manually. In my lab I have the policy set to on.

code snippet

Get-VMHost | Get-VmHostService | Where-Object {$_.key -eq "ntpd"} | Set-VMHostService -policy "on"

With that in mind, the following command I can make the NTP settings of all hosts consistent. This command assumes that I only have one NTP server. I am also stopping and starting the NTP service. It is also worth mentioning that each host that already has the ntp server ‘pool.ntp.org’, will throw a red error that the NtpServer already exists.

Get-VMHost | Get-VmHostService | Where-Object {$_.key -eq "ntpd"} | Stop-VMHostService -Confirm:$false; Get-VMHost | Get-VmHostService | Where-Object {$_.key -eq "ntpd"} | Set-VMHostService -policy "on"; Get-VMHost | Add-VMHostNtpServer -NtpServer pool.ntp.org; Get-VMHost | Get-VMHostFirewallException | Where-Object {$_.Name -eq "NTP client"} | Set-VMHostFirewallException -Enabled:$true; Get-VMHost | Get-VmHostService | Where-Object {$_.key -eq "ntpd"} | Start-VMHostService

Now NTP should be consistent across all hosts. Re-run the first command to validate.

Deploying Additional vCloud Director v10 Appliances for Database High Availability Configuration

Blog Date: 4/14/2020

In my last blog, I walked through the process of Deploying the vCloud Director Appliance v10, and today’s blog will feature the process of deploying two additional standby appliances to create an HA database configuration for vCD. To get an idea of what that architecture would look like, I’ll rip this excellent diagram from VMware’s own documentation.

Deploying additional appliances are pretty straight forward, so lets get started.

1 – Find and upload the OVF for vCD.

2 – Name the VM, select the datacenter and virtual machine folder.

3 – Select the compute cluster

4 – The primary appliance has already been deployed. It is important to note that the same size standby appliance has to be deployed. Because our first primary appliance was deployed as small, so to shall the standby appliance. VMware’s sizing guide and be found here.

5 – Select desired storage disk format and storage where the appliance will reside.

6 – Configure the networks for each network interface, keeping in mind that they will be in reverse order as discussed before.

7 – Fill out the template customization page just like before. Remember all fields including the administrator email are required.

Note:
– Be sure to use the same “System name” that was used for the original vCD primary appliance deployment.
– For the “Installation ID” section, make sure this value reflects increases with the number of appliance being deployed. In this demonstration I am deploying the 2nd and 3rd appliances, so the installation IDs would be 2 and 3 respectively.

8 – On the summary page, verify the deployment and click finish.

9 – Before starting the appliance, it may be a good idea to take a snapshot. Once the appliance has been started, and the configuration scripts attempt to run and fail, the appliance will need to be redeployed. I’d also take a snapshot of the primary appliance, to roll back any failed attempts to join.

10 – Once you have started the appliance, watch for the “Guest OS Initialization Script”. This should take a couple of minutes to run in order to be successful. If it runs for less than 10 seconds, then there was a problem and the appliance will need to be redeployed.

11 – After the appliance boots, look at the /opt/vmware/var/log/vcd/setupvcd.log to validate a successful cluster join. This log can also be used if the appliance deployment failed.

A successful join would look something like this:

12 – Now deploy the 3rd standby appliance using steps 1 through 11.

13 – Once the 3rd appliance has been deployed, it would be a good idea to log into the primary appliance’s 5480 page to validate the health of the new DB cluster.

14 – Log into the provider page (https://appliance_name.com/provider)

15 – Here we can also see the state of the cloud cells.

End – I’ll finish this blog post here. In my next blog, I’ll walk through the steps of configuring the newly deployed vCloud Director environment. Stay tuned.

Deploying vCloud Director v10 Appliance.

Blog Date: 04/07/2020

Prior to this blog post, blogged and walked through the steps of creating a NFS linux server using CentOS 7. You can find the link to that blog post here.

The VMware Cloud Director (vCD) platform is primarily used by service providers, as a cloud offering for their customers. Back when I worked for a service provider, the bulk of my experience came from the version 8.x days, when vCD was a software package to be installed on a Linux VM. Fast forward a few years, and I’ve started deploying vCD 9.7 and vCD 10 appliances for VMware customers, part of Professional Services engagements for VMware that I’ve been working on. Interestingly enough, both customers were not cloud providers, but had specific use cases that vCD achieved.

The vCD appliance deployment certainly is not as clean as other appliances like vCSA and vROps, and I’ve found there to be a few gotchas that can lead to a failed appliance deployment.

Deploying the vCD appliance

  1. Like most appliance deployments, we’ll deploy an ovf template.

2. Name the virtual machine, and select desired deployment datacenter and VM folder location.

3. Select the desired compute location

4. Select the size of the appliance. As this is the first primary cell, select an option that contains “primary”. If you are deploying appliance cells two and three, then you’d select “standby” here if you are creating a cluster. The “vCD Cell Application” would be used for the fourth appliance.
– You’ll also notice two different sizes: Small and Large. These will depend on your environment needs. VMware’s official sizing documentation can be found here.

5. Select desired storage disk format and storage where the appliance will reside.

6. We’ve arrived at the first gotcha: Selecting the network. This is the only ovf deployment I’ve seen that lists the NICs in reverse order. VMware states in their official documentation that “the source network list might be in reverse order. Verify that you are selecting the correct destination network for each source network.” I have yet to see the networks display in the proper order. VMware also states that eth0 and eth1 must be on separate networks in their documentation here. I’ve asked GSS but wasn’t given an answer why. I haven’t found an issue with both connections being on the same network, but for demonstration purposes we’ll do as the official documentation says. Note: I have noticed at least in my lab that the appliance uses eth1 to connect to the NFS server.

7. The second gotcha: Filling out the template customization page. It’s not indicated here that ALL fields are REQUIRED. Yes even the email address is a hard requirement, even though no other appliance deployment requires it.

8. On the summary page, verify the deployment and click finish.

9. Before starting the appliance, it may be a good idea to take a snapshot. Once the appliance has been started, and the configuration scripts attempt to run and fail, the appliance will need to be redeployed.

10. Once you have started the appliance, watch for the “Guest OS Initialization Script”. This should take a couple of minutes to run in order to be successful. If it runs for less than 10 seconds, then there was a problem and the appliance will need to be redeployed.

10a – If the appliance failed to deploy, log into the appliance as root, and look at the /opt/vmware/var/log/vcd/setupvcd.log for details.

10b – On a successful run, you’d see something similar to:

11 – On a successful deployment, log into the appliance 5480 page, and you should see something similar to:

12 – The primary appliance has successfully been deployed. If additional standby appliances are needed, now would be the best time to deploy them.

End – That’s it. In upcoming blog posts, I’ll walk through the process of deploying additional standby appliances, and the initial configuration of vCloud Director.

How To Setup NFS Server on CentOS 7 for vCloud Director v10 Appliance

Blog Date: 10/25/2019
Lasted Updated: 04/06/2020

For the purposes of this demonstration, I will be configuring NFS services on a CentOS 7 VM, deployed to a vSphere 6.7 U3 homelab environment.

NFS Server VM Configuration

Host Name: cb01-nfs01
IP Address: 10.0.0.35
CPU: 2
RAM: 4GB

Disk 1: 20GB – Linux installation (thin provisioned)
Disk 2: 100GB – Will be used for the vCD NFS share (thin provisioned)

Configure the vCD NFS share disk

For this demonstration, I have chosen not to configure Disk 2 that was added to the VM. Therefore, this “how-to” assumes that a new disk has been added to the VM, and the NFS server has been powered on after.

1) Open a secure shell to the NFS server. I have switched to the root account.
2) On my NFS server, the new disk will be “/dev/sdb”, if you are unsure run the following command to identify the new disk on yours:

fdisk -l

3) We need to format the newly added disk. In my case /dev/sdb. So run the following command:

fdisk /dev/sdb

4) Next with the fdisk utility, we need to partition the drive. I used the following sequence:
(for new partition) : n
(for primary partition) : p
(default 1) : enter
(default first sector) : enter
(default last sector) : enter

5) Before saving the partition, we need to change it to ‘Linux LVM’ from its current format ‘Linux’. We’ll first use the option ‘t’ to change the partition type, then use the hex code ‘8e’ to change it to Linux LVM like so:

Command (m for help): t
Selected partition 1

Hex code (type L to list all codes): 8e
Changed type of partition ‘Linux’ to ‘Linux LVM’.

Command (m for help): w

Once you see “Command (m for help):” type ‘w’ to save the config.

Create a ‘Physical Volume, Volume Group and Logical Volume

6) Now that the partition is prepared on the new disk, we can go ahead and create the physical volume with the following command:

# pvcreate /dev/sdb1

7) Now we to create a volume group. You can name it whatever suites your naming standards. For this demonstration, I’ve created a volume group named vg_nfsshare_vcloud_director using /dev/sdb1, using the following command:

# vgcreate vg_nfsshare_vcloud_director /dev/sdb1

Creating a volume group allows us the possibility of adding other devices to expand storage capacity when needed.

8) When it comes to creating logical volumes (LV), the distribution of space must take into consideration both current and future needs. It is considered good practice to name each logical volume according to its intended use.
– In this example I’ll create one LV named vol_nfsshare_vcloud_director using all the space.
– The -n option is used to indicate a name for the LV, whereas -l (lowercase L) is used to indicate a percentage of the remaining space in the container VG.
The full command used looks like:
# lvcreate -n vol_nfsshare_vcloud_director -l 100%FREE vg_nfsshare_vcloud_director

9) Before a logical volume can be used, we need to create a filesystem on top of it. I’ve used ext4 since it allows us both to increase and reduce the size of the LV.
The command used looks like:

# mkfs.ext4 /dev/vg_nfsshare_vcloud_director/vol_nfsshare_vcloud_director

Writing the filesystem will take some time to complete. Once successful you will be returned to the command prompt.

Mounting the Logical Volume on Boot

10) Next, create a mount point for the LV. This will be used later on for the NFS share. The command looks like:

# mkdir -p /nfsshare/vcloud_director

11) To better identify a logical volume we will need to find out what its UUID (a non-changing attribute that uniquely identifies a formatted storage device) is. The command looks like:

# blkid /dev/vg_nfsshare_vcloud_director/vol_nfsshare_vcloud_director

In this example, the UUID is: UUID=2aced5a0-226e-4d36-948a-7985c71ae9e3

12) Now edit the /etc/fstab and add the disk using the UUID obtained in the previous step.

# vi /etc/fstab

In this example, the entry would look like:

UUID=2aced5a0-226e-4d36-948a-7985c71ae9e3 /nfsshare/vcloud_director ext4 defaults 0 0

Save the change with: wq

13) Mount the new LV with the following command:

# mount -a

To see that it was successfully mounted, use the following command similar to:

# mount | grep nfsshare

Assign Permissions to the NFS Share

14) According to the Preparing the Transfer Server Storage section of the vCloud DIrector 10.0 guide, you must ensure that its permissions and ownership are 750 and root:root .

Setting the permissions on the NFS share would look similar to:

# chmod 750 /nfsshare/vcloud_director

Setting the ownership would look similar to:

# chown root:root /nfsshare/vcloud_director

Install the NFS Server Utilities

15) Install the below package for NFS server using the yum command:

# yum install -y nfs-utils

16) Once the packages are installed, enable and start NFS services:

# systemctl enable nfs-server rpcbind

# systemctl start nfs-server rpcbind

16) Modify /etc/exports file to make an entry for the directory /nfsshare/vcloud_director .

– According to the Preparing the Transfer Server Storage guide, the method for allowing read-write access to the shared location for two cells named vcd-cell1-IP and vcd-cell2-IP is the no_root_squash method.

# vi /etc/exports

17) For this demonstration, my vCD appliance IP on the second nic is 10.0.0.38, so I add the following:

/nfsshare/vcloud_director 10.0.0.38(rw,sync,no_subtree_check,no_root_squash)

– There must be no space between each cell IP address and its immediate following left parenthesis in the export line. If the NFS server reboots while the cells are writing data to the shared location, the use of the sync option in the export configuration prevents data corruption in the shared location. The use of the no_subtree_check option in the export configuration improves reliability when a subdirectory of a file system is exported.
– As this is only a lab, I only have a single vCD appliance for testing. If a proper production deployment, add additional lines for each appliance IP.

18) Each server in the vCloud Director server group must be allowed to mount the NFS share by inspecting the export list for the NFS export. You export the mount by running exportfs -a to export all NFS shares. To re-export use exportfs -r.

# exportfs -a

– To check the export, run the following command:

# exportfs -v

– Validate NFS daemons are running on the server by using rpcinfo -p localhost or service nfs status. NFS daemons must be running on the server.

# rpcinfo -p localhost

or

# systemctl status nfs-server.service

Configure the Firewall

19) We need to configure the firewall on the NFS server to allow NFS client to access the NFS share. To do that, run the following commands on the NFS server.

# firewall-cmd --permanent --add-service mountd

# firewall-cmd --permanent --add-service rpc-bind
# firewall-cmd --permanent --add-service nfs
# firewall-cmd --reload

20) That’s it. Now we can deploy the vCloud Director 10.0 appliance(s).

Optional NFS Share Testing

I highly recommend testing the NFS share before continuing with the vCloud DIrector 10.0 appliance deployment. For my testing, I have deployed a temporary CentOS 7 VM, with the same hostname and IP address as my first vCD appliance. I have installed nfs-utils on my test VM.
# yum install -y nfs-utils

OT-1) Check the NFS shares available on the NFS server by running the following command on the test VM. change the IP and share here to your NFS server.

# showmount -e 10.0.0.35

As you can see, my mount on my NFS server is showing one exported list for 10.0.0.38, my only vCD appliance

OT-2) Create a directory on NFS test VM to mount the NFS share /nfsshare/vcloud_director which we have created on the NFS server.

# mkdir -p /mnt/nfsshare/vcloud_director

OT-3) Use below command to mount the NFS share /nfsshare/vcloud_director from NFS server 10.0.0.35 in /mnt/nfsshare/vcloud_director on NFS test VM.

# mount 10.0.0.35:/nfsshare/vcloud_director /mnt/nfsshare/vcloud_director

OT-4) Verify the mounted share on the NFS test VM using mount command.

# mount | grep nfsshare

You can also use the df -hT command to check the mounted NFS share.

# df -hT

OT-5) Next we’ll create a file on the mounted directory to verify the read and write access on NFS share. IMPORTANT** during the vCD appliance deployment, it is expected that this directory is empty, else it could make the deployment fail. Remember to cleanup after the test.

# touch /mnt/nfsshare/vcloud_director/test

OT-6) Verify the test file exists by using the following command:

# ls -l /mnt/nfsshare/vcloud_director/

OT-7) Clean your room. Cleanup the directory so that it is ready for the vCD deployment.

# rm /mnt/nfsshare/vcloud_director/test

After successfully testing the share, we now know that we can write to that directory from the vCD appliance IP address, and that we can remove files.




In my next post, I will cover deploying the vCloud Director 10.0 appliance. Stay tuned!

Key Takeaways from VMworld US 2019 in San Francisco.

Looking back on this past week, all I can say is that it was pretty crazy. It was my first time to San Francisco, and I honestly left with mixed feelings on the City.

VMworld itself was pretty good! VMware cut back the general sessions to just two days (Monday and Tuesday), and I am honestly conflicted about the missing Thursday general session, as they usually showcase some non VMware related tech for this session.

If I could sum up VMworld in just one word this year, it would be: Kubernetes

Image result for kubernetes meme

VMware debuted their cloud management solution VMware Tanzu with partnership with Pivital, and showcased the ability to manage multiple Kubernetes clusters across multiple clouds, all from one central management dashboard, and Project Pacific, VMware’s endeavor to embed Kubernetes into vSphere.

VMware also added the Odyssey competition this year just outside of the Hands on Labs area. This was in the HOL style, however this only gave attendees hints on what needed to be completed, and really allowed you to test your knowledge and skills in order to complete the task, without the hand holding that the typical HOL provides. Teams were able to compete against each other for the best times, and had some pretty decent prizes.

All in all, it was a decent VMworld, and they will be returning to San Francisco next year. I can’t say that I enjoyed the location, especially with the homeless problem San Francisco has, and I would much rather see VMworld bring it’s 20k+ attendees to a cleaner city, without the drugs, pan handlers, and human waste on the streets. You’d think that as someone who grew up on a farm, and is used to certain sights and smells, that it wouldn’t have bothered me so much, but this took me by surprise

This was also a special VMworld for me this year, as I was finally able to meet Pat Gelsinger. I can tell he really likes the community, and would love to stay longer and chat with everyone. I certainly would have loved the chance to talk with him longer, but I know he had other obligations that night.

The vExpert party was fun as always, and we were able to get a nice photo of the group.

The last session I attended this year was “If this then that for vSphere – the power of event-driven automation” with keynote speakers William Lam, and Michael Gasch. Several well known VMware employees and bloggers were in attendance, including Alan Renouf, who was two chairs down from me, and for this first time I felt this crippling awkwardness of wanting to take pictures with all of them, but was so star stuck that I couldn’t bring myself to it. I know these guys are just normal folks who just happen to be stars in the vCommunity, but I had to contain myself, and enjoy the keynote. Hopefully our paths will cross again, and I can personally meet them.

VMworld General Session, Day 2, Tuesday August 27th, 2019: VMware Tanzu demos, and new CTO announcement!

Day 3 of VMworld 2019 in San Francisco is underway, and it is the second day of General sessions. Clearly today’s theme is Kubernetes, and VMware’s Ray O’Farrell kicked off the keynote by talking about VMware Tanzu and Tanzu’s mission control.

The Keynote then included the integration of NSX-T with Tanzu. The ability to test changes, to see the impact on the environment before going live, was truly amazing

There was also an interesting demo with VMware Horizon and Workspace ONE, showcasing the usage deploying work spaces rapidly from the cloud, and creating zero-trust security policy withing workspace ONE with Carbon Black

Pat jumped up on stage to announce that Ray O’Ferrell (@ray_ofarrell) would be leading VMware’s cloud native apps division, and Greg Lavender (@GregL_VMware) was named the New CTO of VMware.

VMware also announced a limited edition t-shirt that would be given away later that day. VMware had roughly 1000 of these shirts made up, and luckily I was able to get a shirt before they ran out.

Plenty of people were upset about not getting a shirt due to the limited run. Gives a whole new meaning to nerd rage…. (sorry I couldn’t help myself).