VMware Cloud Foundation 3.0 Architecture Poster

September 24, 2018September 24, 2018 captainvopsrd

I’m pleased to announce the release of the VMware Cloud Foundation 3.0 Architecture Poster VMware Cloud Foundation includes the full software-defined data center stack including compute, storage, networking, cloud management, and workload migration. This major release has increased hardware flexibility with support for customer-defined networking and broad support for all vSAN ReadyNodes. VMware Cloud Foundation The post VMware Cloud Foundation 3.0 Architecture Poster appeared first on Cloud…Read More

VMware Social Media Advocacy

Announcing General Availability of vRealize…

September 24, 2018 captainvopsrd

Announcing General Availability of vRealize Suite 2018 & vCloud Suite 2018!

Announcing General Availability of vRealize…

VMware vRealize Suite 2018 and vCloud Suite 2018, the industry leading cloud management platform is now available for download. As announced at VMworld 2018 in Las Vegas, VMware’s cloud management platform is further helping IT deliver developer-friendly infrastructure from any cloud with secure and consistent operations. The new vRealize Suite 2018 includes: vRealize Automation 7.5 vRealize The post Announcing General Availability of vRealize Suite 2018 & vCloud Suite 2018! appeared first on…Read More

VMware Social Media Advocacy

High CPU utilization on NSX Appliance 6.2.4

September 23, 2018September 23, 2018 CaptainvOPs

I realize that writing up this blog post now, may be irrelevant considering most if not all VMware customers are well beyond NSX appliance 6.2.4. But some folks may still find the information shared here still relevant. At the very least the instructions for restarting the bluelane-manager service on the NSX appliance is still something handy to keep in your Rolodex of commands.

There’s an interesting bug in versions of the NSX appliance ranging from versions 6.2.4 – 6.2.8, where the utilization slowly climbs, eventually maxing out at 100% CPU utilization after few hours. For my environment, we had vSphere version 6, and roughly 60 hosts that were also on ESXi 6. We were also using traditional SAN storage on FCOE. In this case a combination of IBM XIV, and INFINIDATs. In most cases, we could just restart the NSX appliance, which would resolve the CPU utilization issue, however sometimes within two hours, the CPU utilization would climb back up to 100% again. When the appliance CPU maxed out, after a few seconds the NSX manager user interface would typically crash.

The Cause: (copied from KB2145934)

“This issue occurs when the PurgeTask process is not purging the proper amount of job tasks in the NSX database causing job entries to accumulate. When the number of job entries increase, the PurgeTask process attempts to purge these job entries resulting in higher CPU utilization which triggers (GC) Garbage Collection. The GC adds more CPU utilization.”

The only problem with the KB, is that our environment was currently on 6.2.4, so clearly the problem was not resolved.

In order to buy ourselves some time, without needing to restart the NSX appliance, we found that simply restarting a service on the NSX appliance called ‘bluelane-manger‘, had the same affect, but this was only a work around.

You can take the following steps to restart the bluelane-manager service:

SSH to the NSX Manager using the ‘admin’ account
Type

en

Type:

st en

When prompted for the password, type:

IAmOnThePhoneWithTechSupport

To get the status of the bluelane manager service type:

/etc/rc.d/init.d/bluelane-manager status

To restart the bluelane-manager service, type:

 /etc/rc.d/init.d/bluelane-manager restart

Now after a few seconds, you should notice that the NSX appliance user interface has restored to normal functionality, and you can log in, and validate that the CPU has fallen to normal usage.

What made the issue worse, was the fact that we had hosts going into the purple diagnostic screen. I’m not talking one or two here. Imagine having over 20 ESXi hosts drop at the same time, during production hours, and keep in mind that all of these hosts were running customer workloads….. If you’ll excuse the vulgarity, that certainly has a pucker factor exceeding 10. At the time, I was working for a service provider running vCloud Director. The customers were basically sharing the ESXi host resources. We were also utilizing VMware’s Guest Introspection (GI) service, as we also had trend micro deployed, and as a result most customers were sitting in the default security group.

Through extensive troubleshooting with VMware developers, at a high level we determined the following: Having all customer VMs in the default NSX security group, every time a customer VM powered on or off, was created or destroyed, vMotioned, replicated in or out of the environment, all had to be synced back to the NSX appliance, which then synced with the ESXi hosts. Looking at the at specific logs on the ESXi hosts that only VMware had access to, we saw a backlog of sync instructions that the hosts would never have time to process, which was contributing to the NSX appliance CPU issue. This was also causing the hosts to eventually purple screen. Fun fact was that by restarting the hosts we could buy ourselves close to two weeks before the issue would reoccur, however, performing many simultaneous vMotions would also cause 100% CPU on the NSX appliance, which would put us into a bad state again.

Thankfully, VMware was currently working on a bug fix release at the time NSX 6.2.8, and our issue served to spur the development team along in finalizing the release, along with adding a few more bug fixes they had originally thought was resolved in the 6.2.4 release.

NSX 6.2.8 release notes

Most relevant to our issues that we faced were the following fixes:

Fixed Issue 1849037: NSX Manager API threads get exhausted when communication link with NSX Edge is broken
Fixed Issue 1704940: You may encounter the purple diagnostic screen on the ESXi host if the pCPU count exceeds 256
Fixed Issue 1760940: NSX Manager High CPU triggered by many simultaneous vMotion tasks
Fixed Issue 1813363: Multiple IP addresses on same vNIC causes delays in firewall publish operation
Fixed Issue 1798537: DFW controller process on ESXi (vsfwd) may run out of memory

Upgrading to NSX 6.2.8 release, and rethinking our security groups, brought stability back to our environment, although not all above issues were completely resolved as we later found out. In short most “fixes” were really just process improvements under the hood. Specifically we could still cause 100% CPU utilization on the NSX appliance by putting too many hosts into maintenance mode consecutively, however at the very least the CPU utilization was more likely able to recover on its own, without us needed to restart the service or appliance. Now why is that important you might ask? Being a service provider, you want to quickly and efficiently roll through your hosts while doing upgrades, and having something like this inefficiency in the NSX code base, can drastically extend maintenance windows. Unfortunately for us at the time, as VMware came out with the 6.2.8 maintenance patch after 6.3.x, so the fixes were also not apart of the 6.3.x release yet. KB2150668

As stated above, the instructions for restarting the bluelane-manager service on the NSX appliance is still something that is very handy to have.

Download VMware NSX-T Data Center 2.3

September 19, 2018September 19, 2018 captainvopsrd

Download VMware NSX-T Data Center 2.3

Download VMware NSX-T Data Center, designed to address emerging application frameworks and architectures that have heterogeneous endpoints and technology stacks.

VMware Social Media Advocacy

What does End of General Support mean?

September 19, 2018September 19, 2018 captainvopsrd

What does End of General Support mean?

On September 19th, vSphere 5.5 exited its general support phase and moved into something called “Technical Guidance”. In response to this, many have already moved to a newer release of the vSphere 6.x line. Whether it be for compatibility concerns or a reasonable wariness of touching what’s not broken, there are several of us who The post What does End of General Support mean? appeared first on VMware vSphere Blog .

VMware Social Media Advocacy

Introducing new VMware Cloud on AWS training…

September 19, 2018September 19, 2018 captainvopsrd

Introducing new VMware Cloud on AWS training course

Introducing new VMware Cloud on AWS training…

A brand new 3-day training course has been released for VMware Cloud on AWS. Watch this lightboard illustration by one of our Senior Technical Instructors to understand more about what you’ll learn in the 3-day course. Find out more on http://www.vmware.com/education or read the course overview here.

VMware Social Media Advocacy

VMware vSAN 6.7U1 Storage reclamation – TRIM/UNMAP

September 18, 2018September 18, 2018 captainvopsrd

VMware vSAN 6.7U1 Storage reclamation – TRIM/UNMAP

VMware vSAN 6.7U1 introduces automated space reclamation support with TRIM and SCSI UNMAP support. SCSI UNMAP and the ATA TRIM command enable the guest OS or file system to notify that back-end storage that a block is no longer in use and may be reclaimed.

VMware Social Media Advocacy

Free Course: VMware Cloud Foundation Fundamentals

September 17, 2018 captainvopsrd

Free Course: VMware Cloud Foundation Fundamentals

This eLearning course provides information about managing a software-defined data center using the VMware Cloud Foundation™ unified SDDC platform.

VMware Social Media Advocacy

vSAN 6.7 – vSAN Management Today and in the Future

September 14, 2018September 14, 2018 captainvopsrd

vSAN 6.7 – vSAN Management Today and in the Future

The momentum of technical innovation continues with the latest release of vSAN, the industry leading HCI solution. This session will provide a technical overview of what’s new in vSAN 6.7. Join Duncan Epping, Chief Technologist VMware EMEA to learn about the new features and functionality of vSAN 6, how this release delivers a more intuitive operating experience, a more consistent application experience, whilst offering a more holistic support experience for our customers.

VMware Social Media Advocacy

Register for VMworld 2018 Europe!

September 14, 2018September 14, 2018 captainvopsrd

Register for VMworld 2018 Europe!

VMworld 2018 Europe takes Barcelona by storm this November. Meet experts, learn about industry hot topics, preview new hands-on labs, and attend networking events.

VMware Social Media Advocacy

CaptainvOPS

Month: September 2018

VMware Cloud Foundation 3.0 Architecture Poster

VMware Cloud Foundation 3.0 Architecture Poster

Announcing General Availability of vRealize…

Announcing General Availability of vRealize…

High CPU utilization on NSX Appliance 6.2.4

Download VMware NSX-T Data Center 2.3

Download VMware NSX-T Data Center 2.3

What does End of General Support mean?

What does End of General Support mean?

Introducing new VMware Cloud on AWS training…

Introducing new VMware Cloud on AWS training…

VMware vSAN 6.7U1 Storage reclamation – TRIM/UNMAP

VMware vSAN 6.7U1 Storage reclamation – TRIM/UNMAP

Free Course: VMware Cloud Foundation Fundamentals

Free Course: VMware Cloud Foundation Fundamentals

vSAN 6.7 – vSAN Management Today and in the Future

vSAN 6.7 – vSAN Management Today and in the Future

Register for VMworld 2018 Europe!

Register for VMworld 2018 Europe!