vMotion fails at 67% on esxi 6 in vCenter 6.

Came across an interesting error the other night while on call, as I had a host in a cluster that VM’s could not vMotion off of either manually or through DRS.  I was seeing the following error message in vSphere:


The source detected that the destination failed to resume.

vMotion migration [-1062731518:1481069780557682] failed: remote host <192.168.1.2> failed with status Failure.
vMotion migration [-1062731518:1481069780557682] failed to asynchronously receive and apply state from the remote host: Failure.
Failed waiting for data. Error 195887105. Failure.


 

  • While tailing the host.d log on the source host I was seeing the following error:
2016-12-09T19:44:40.373Z warning hostd[2B3E0B70] [Originator@6876 sub=Libs] ResourceGroup: failed to instantiate group with id: -591724635. Sysinfo error on operation return ed status : Not found. Please see the VMkernel log for detailed error information

 

  • While tailing the host.d log on the destination host, I was seeing the following error:
2016-12-09T19:44:34.330Z info hostd[33481B70] [Originator@6876 sub=Snmpsvc] ReportVMs: processing vm 223
2016-12-09T19:44:34.330Z info hostd[33481B70] [Originator@6876 sub=Snmpsvc] ReportVMs: serialized 36 out of 36 vms
2016-12-09T19:44:34.330Z info hostd[33481B70] [Originator@6876 sub=Snmpsvc] GenerateFullReport: report file /tmp/.vm-report.xml generated, size 915 bytes.
2016-12-09T19:44:34.330Z info hostd[33481B70] [Originator@6876 sub=Snmpsvc] PublishReport: file /tmp/.vm-report.xml published as /tmp/vm-report.xml
2016-12-09T19:44:34.330Z info hostd[33481B70] [Originator@6876 sub=Snmpsvc] NotifyAgent: write(33, /var/run/snmp.ctl, V) 1 bytes to snmpd
2016-12-09T19:44:34.330Z info hostd[33481B70] [Originator@6876 sub=Snmpsvc] GenerateFullReport: notified snmpd to update vm cache
2016-12-09T19:44:34.330Z info hostd[33481B70] [Originator@6876 sub=Snmpsvc] DoReport: VM Poll State cache - report completed ok
2016-12-09T19:44:40.317Z warning hostd[33081B70] [Originator@6876 sub=Libs] ResourceGroup: failed to instantiate group with id: 727017570. Sysinfo error on operation returne d status : Not found. Please see the VMkernel log for detailed error information

 

  • While tailing the destination vmkernel.log host, I was seeing the following error:
2016-12-09T19:44:22.000Z cpu21:35086 opID=b5686da8)World: 15516: VC opID AA8C46D5-0001C9C0-81-91-cb-a544 maps to vmkernel opID b5686da8
2016-12-09T19:44:22.000Z cpu21:35086 opID=b5686da8)Config: 681: "SIOControlFlag2" = 1, Old Value: 0, (Status: 0x0)
2016-12-09T19:44:22.261Z cpu21:579860616)World: vm 579827968: 1647: Starting world vmm0:oats-agent-2_(e00c5327-1d72-4aac-bc5e-81a10120a68b) of type 8
2016-12-09T19:44:22.261Z cpu21:579860616)Sched: vm 579827968: 6500: Adding world 'vmm0:oats-agent-2_(e00c5327-1d72-4aac-bc5e-81a10120a68b)', group 'host/user/pool34', cpu: shares=-3 min=0 minLimit=-1 max=4000, mem: shares=-3 min=0 minLimit=-1 max=1048576
2016-12-09T19:44:22.261Z cpu21:579860616)Sched: vm 579827968: 6515: renamed group 5022126293 to vm.579860616
2016-12-09T19:44:22.261Z cpu21:579860616)Sched: vm 579827968: 6532: group 5022126293 is located under group 5022124087
2016-12-09T19:44:22.264Z cpu21:579860616)MemSched: vm 579860616: 8112: extended swap to 46883 pgs
2016-12-09T19:44:22.290Z cpu20:579860616)Migrate: vm 579827968: 3385: Setting VMOTION info: Dest ts = 1481312661276391, src ip = <192.168.1.2> dest ip = <192.168.1.17> Dest wid = 0 using SHARED swap
2016-12-09T19:44:22.293Z cpu20:579860616)Hbr: 3394: Migration start received (worldID=579827968) (migrateType=1) (event=0) (isSource=0) (sharedConfig=1)
2016-12-09T19:44:22.332Z cpu0:33670)MigrateNet: vm 33670: 2997: Accepted connection from <::ffff:192.168.1.2>
2016-12-09T19:44:22.332Z cpu0:33670)MigrateNet: vm 33670: 3049: data socket size 0 is less than config option 562140
2016-12-09T19:44:22.332Z cpu0:33670)MigrateNet: vm 33670: 3085: dataSocket 0x430610ecaba0 receive buffer size is 562140
2016-12-09T19:44:22.332Z cpu0:33670)MigrateNet: vm 33670: 2997: Accepted connection from <::ffff:192.168.1.2>
2016-12-09T19:44:22.332Z cpu0:33670)MigrateNet: vm 33670: 3049: data socket size 0 is less than config option 562140
2016-12-09T19:44:22.332Z cpu0:33670)MigrateNet: vm 33670: 3085: dataSocket 0x4306110fab70 receive buffer size is 562140
2016-12-09T19:44:22.332Z cpu0:33670)VMotionUtil: 3995: 1481312661276391 D: Stream connection 1 added.
2016-12-09T19:44:24.416Z cpu1:32854)elxnet: elxnet_allocQueueWithAttr:4255: [vmnic0] RxQ, QueueIDVal:2
2016-12-09T19:44:24.416Z cpu1:32854)elxnet: elxnet_startQueue:4383: [vmnic0] RxQ, QueueIDVal:2
2016-12-09T19:44:24.985Z cpu12:579860756)VMotionRecv: 658: 1481312661276391 D: Estimated network bandwidth 471.341 MB/s during pre-copy
2016-12-09T19:44:24.994Z cpu4:579860755)VMotionSend: 4953: 1481312661276391 D: Failed to receive state for message type 1: Failure
2016-12-09T19:44:24.994Z cpu4:579860755)WARNING: VMotionSend: 3979: 1481312661276391 D: failed to asynchronously receive and apply state from the remote host: Failure.
2016-12-09T19:44:24.994Z cpu4:579860755)WARNING: Migrate: 270: 1481312661276391 D: Failed: Failure (0xbad0001) @0x4180324c6786
2016-12-09T19:44:24.994Z cpu4:579860755)WARNING: VMotionUtil: 6267: 1481312661276391 D: timed out waiting 0 ms to transmit data.
2016-12-09T19:44:24.994Z cpu4:579860755)WARNING: VMotionSend: 688: 1481312661276391 D: (9-0x43ba40001a98) failed to receive 72/72 bytes from the remote host <192.168.1.2>: Timeout
2016-12-09T19:44:24.998Z cpu4:579860616)WARNING: Migrate: 5454: 1481312661276391 D: Migration considered a failure by the VMX. It is most likely a timeout, but check the VMX log for the true error.

 

We are using the vSphere distributed switch in our environment, and each host has a vmk dedicated to vMotion traffic only, so this was my first test, verified the IP and subnet for the vmk on the source/destination hosts, and I was successfully able to ping using vmkping to the destination host, and tested the connection both ways.

The second test completed was to power off a VM, and test its ability to vMotion off of the host – this worked.  When I powered the VM back on it immediately went back to the source host.  I then tried to vMotion the VM again while it was powered on from the affected source host, and move it to the destination host like I had before, and to my surprise it worked now.  Tested this process with a few other VMs for consistency.  I tried to restart a VM on the affected host, and then move it off to another host, but this did not work.

My final test was to vMotion a VM from a different host to the affected host.  This worked as well, and I was even able to vMotion off from that affected host again.

 

In our environment we have a Trend-micro agent VM and a GI VM running on each host.  I logged into the vSphere web-client to look at the status of the Trend-micro VM and there was no indication of an error, I found the same status checking the GI vm.

Knowing we have had issues with Trend-micro in the past, I powered down the Trend-micro VM running on the host, and attempted a manual vMotion of a test VM I knew couldn’t move before – IT WORKED.  Tried another with the same result.  Put the host into maintenance mode to try and evacuate the rest of the customer VMs off from it with success!

To wrap all of this up, the Trend-micro agent VM running on the esxi6 host was preventing other VMs from vMotioning off either manually or through DRS.  Once the trend-micro agent VM was powered off, I was able to evacuate the host.

 

Creating vROps Policies and How To Apply Them To Object Groups.

Creating policies in VMware’s vRealize Operations Appliance can be strait forward, if there is a decent understanding of every platform it’s monitoring.  In my last post of this series, I covered the creation of object groups, and that post is important here because policies can be created and assigned to those object groups, allowing the tuning of alerts received for those groups.

Once logged in to the vROps appliance, go into the administration section, and there you will find the policies.

vrops37.png

  • VMware has included many base policies in the policy library, which in most cases will be fine for the initial configuration for the appliance, but you may want to create additional policies to suite your specific environment needs.
  • Also take note of the blue film strip in the upper right corner.  This will take you to VMware’s video repository of policies explanation and a brief how-to video.  These video links can be found throughout the configuration of the appliance, and more are added with each release.

To create a new policy click on the green plus sign to get started.  Give the policy a unique name, and it would be good practice to give a description of what the policy is intended to do.  When creating a policy, you have the ability to “start with” a VMware pre-defined policy, and I recommend taking advantage of that until there is a firm understanding of what these policies do.

vrops38

On the Select Base Policy tab, you can use the drop down menu on the left to get a policy overview of what is being monitored.  In this example, Host system was selected.

vrops39

Policy Overrides can also be incorporated into this policy.  In other words, if there are certain alerts that you do not want, one of the pre-defined policies may already have those alerts turned off, so those policies can be added to the new policy being created here.  Work smarter, not harder right?

vrops40

Moving along to the Analysis Settings tab, here you can see how vROps analyses the alerts, determines thresholds, and assigned system badges.  These can be left at their current settings per the policy you are building off of, or you can click on the padlock to the right and make individual changes.  Keep in mind under the “Show changes for” drop down menu, you will have many objects to select to change the analysis settings on.

vrops41

The Alert/Systems Definitions tab is probably where the majority of time will be spent.  The “Alert Definitions” box at the top is where alerts can be turned on or off based on the base policy used to create this one, or the override policies used.

  • Each management pack installed will have it’s own category for object type.  In other words, “host system” is listed under the vCenter category, but if vCloud Director management pack was installed, it would also have a “host system” under its category.  Each management pack has the ability to add additional alerts for objects referenced in other management packs.  Take time going through each category to see what alerts may need configuring.
  • The State of each alert will either be local with a green check-mark: meaning you enabled it, inherited with a grey check-mark: meaning it is enabled via another policy that was used to create this one, Local with the red crossed out circle: meaning you disabled the alert for the policy, or inherited with a grayed out crossed out circle: meaning it is disabled via another policy that was used to create this one.  Disabling alerts here will still allow the metrics to be collected for the object, you just wont get the alarm for it.
  • The System Definitions section has the same “object type” drop down menu, and you can select the object type here to configure system thresholds for how the symptoms are triggered for the alert selected in the top Alert Definition box above.  I typically do these in tandem.

vrops43

Finally, you can apply the policy to the custom groups you created before in the Apply Policy to Groups tab.

vrops42

Once you click save, and go back to the Active Policies tab, you will be able to see the new policy created, and within five minutes, you should see the Affected Objects count rise.  You can see here that I have a policy marked with “D” meaning it is the default appliance policy.  You can set your own policy as default by clicking the blue circle icon with the arrow on the upper left side.  It may take up to 24 hours before the active alert page reflects the settings of the new policy.  Otherwise you can manually clear those alerts.

vrops44

Previous post to this series: Configuring VMware vRealize Operations Manager Object Groups

ESXi host fails to upgrade from 5.5 Update 3 to 6 Update 2

This happened to me today and thought it was worth sharing.  Most of the hosts in this particular cluster upgraded fine to ESXi 6u2 from ESXi 5.5u3 with the exception of this one host.  Update manager kept giving me this error “Cannot run upgrade script on host” in the vCenter Recent Tasks pane.

esxi01

A quick google search brought me to this KB article 2007163, but after following the KB I wasn’t able to find the referenced error “Remediation failed due to non mode failure “on the update manager server (Win2008) under C:\AppData\VMware\Update Manager\Logs\vmware-vum-server-log4cpp.log file.

I started an SSH session to the ESXi host, but wasn’t able to find and entry similar to the error “OSError: [Errno 39] Directory not empty:” in the /var/log/vua.log file

I instead found this error:

—————————————————————-

[FFD0D8C0 error ‘Default’] Alert:WARNING: This application is not using QuickExit().

The exit code will be set to 0.@ bora/vim/lib/vmacore/main/service.cpp:147

–> Backtrace:

–> backtrace[00] rip 1bc228c3 Vmacore::System::Stacktrace::CaptureFullWork(unsigned int)

—————————————————————–

By chance, I happened to check space on the ESXi host #df -h and found I had a partition that was 100% full.

esxi02

So I changed directory to it # cd /storage/core/……./ where I found two more directories  /var/core/ .  Using the command #ls to list the directory, I found two zdumps.

esxi03

I deleted the two zdumps, and then checked the space again #df -h

esxi04

Seeing now that my directory is now 68% utilized instead of 100%, I attempted the ESXi 6u2 upgrade again this time with success.

 

 

How to evacuate virtual machines from one host to another with PowerCLI

If you ever find yourself in need of evacuating an esxi host there is a handy PowerCLI command that can do just that, and it maintains the resource pools for the virtual machines too.  This was used in vCenter 6u2, PowerCLI 6.3 R1, esxi 5.5

–  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –

Connect-VIserver <vcenter-name/ip>

Get-VM -Location “<sourcehost>” | Move-VM -Destination (Get-Vmhost “<destinationhost>”).

–  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –  –

I recently went through an outage that affected our ability to put hosts into maintenance mode, as the vMotion operation would get stuck at the vCenter level at 13%, with no indication at the host level that something was happening.  This PowerCLI command allowed me to evacuate one host’s virtual machines onto another getting me through all 18 hosts in the cluster.

 

 

Nightly Automated Script To Gather VM and Host Information, and Output To A CSV

Admittedly this was my first attempt at creating a Powershell script, but thought I would share my journey.  We wanted a way to track the location of customer VMs running on our ESXi hosts, and their power state in the event of a host failure.  Ideally this would be automated to run multiple times during the day, and the output would be saved to a csv on a network share.

We had discovered a bug in vCenter 5.5 where if the ESXi 5.5 host was in a disconnected state, and an attempt was made to reconnect it using the vSphere client without knowing that the host was actually network isolated, HA would not restart the VMs on another host as expected.  We would later test this in a lab to find that if we had not used the reconnect option, HA would restart the VMs as expected on other hosts.  We again tested this scenario in vCenter 6 update 2, and the bug was not present.

So the first powercli one-liner I came up with was the following:

> get-vm | select VMhost, Name, @{N="IP Address";E={@($_.guest.IPaddress[0])}}, PowerState | where {$_.PowerState -ne "PoweredOff"} | sort VMhost

get-vm1

I wanted a list of powered on VMs, their IPs, what host they were running on, and I wanted to sort the output by the host name.  Knowing I was on the right track, now I wanted to be able to connect to multiple data centers, and have each data center’s output saved to a CSV on a network share.  I didn’t really need to hang on to the CSVs in that network share for more than seven days, so I also wanted to build in logic so that it would essentially cleanup after itself.

That script looks something like this:

#Initial variables
$vCenter = @()
$sites = @("vcenter01","vcenter02","vcenter03")

#get array of sites and establishes connections to vCenters
   foreach ($site in $sites) {
   $vCenter = $site + "domain.net"

   Connect-VIServer $vCenter

#get list of not equal to powered off VMs, there IP and which hosts they're running on
 get-vm | select VMhost, Name, @{N="IP Address";E={@($_.guest.IPaddress[0])}}, PowerState | where {$_.PowerState -ne "PoweredOff"} | sort VMhost | Export-Csv -Path "c:\path\to\output\$site $((Get-Date).ToString('MM-dd-yyyy_hhmm')).csv" -NotypeInformation -useculture

#Disconnect from vCenters 
 Disconnect-VIServer -Force -Confirm:$false -Server $vCenter

}

#Cleanup old csv after 7 days.
$limit = (Get-Date).AddDays(-7)
$path = "c:\path\to\output\"

Get-ChildItem -Path $path -Recurse -Force | Where-Object { !$_.PSIsContainer -and $_.CreationTime -lt $limit } | Remove-Item -Force

Something I did not know until after running this in a large production environment, is that the get-vm call is heavy and not very efficient.  When I ran this script in my lab it took less than 15 seconds to run.  However in a production environment, connecting to data centers all over the globe, it took over 40 minutes to run.

A colleague of mine who had automation experience, exposed me to another cmdlet called get-view, and said it would be much faster to run as it was more efficient gathering the data needed.  So I rewrote my script and now it looks like:

_________________________________________________________________

get-vm4-5

 

_________________________________________________________________

The new code took less than a couple of minutes to run in my production environment.  I have a Windows VM deployed that’s running the VMware poweractions fling, and it also runs some scheduled scripts.  This script would be running from that server, so I added an additional function to the script that creates a WIN event entry so it could be tracked from a syslog server.

So the final script can be downloaded here.  *Disclaimer – test this in a lab first as the code will need to be updated to suit your needs.

 

 

Add The vROps License, Configuring LDAP, and The SMTP For vRealize Operations Manager (vROps)

If you’ve been following my previous posts, I discussed what vRealize Operations Manager is, how to get the appliance deployed, how to get the master replica, data nodes and remote collectors attached to the cluster, and finally how to get data collection started.

Now it’s time to license vRealize Operations Manager.  This can be achieved by logging into the appliance via: < https ://vrops-appliance/ui/login.action >.  Next go into the Administration section, and there you’ll see Licensing.  Click the green plus sign to add the vROps license.

vrops23

About seven down from Licensing on the left hand column, you will see Authentication Sources.  This is where you configure LDAP.

vrops22

Again click the green plus sign to configure the LDAP source.

vrops24

Once the credentials have been added, test the connection and then if everything checks out click OK.

Lastly lets get the SMTP service configured,  about three down from Authentication sources you’ll find outbound settings.  Click the green plus to add a new smtp.

vrops25

Once you have the SMTP information added, test the connection, and if everything checks out click save.

vrops26

So now you should have a functioning licensed vrops instance.  In future posts I will cover creating object groups, policies, and configuring some alert emails.

Next Post: Configuring VMware vRealize Operations Manager Object Groups

Last Post: Configuring VMware vRealize Operations Manager Adapters For Data Collection

Configuring VMware vRealize Operations Manager Adapters For Data Collection

If you’ve followed my recent blog post on  Installing vRealize Operations Manager (vROps) Appliance, you are now ready to configure the built in vSphere adapter to start data collection.

Depending on how big your environment is, and IF you have remote collectors deployed, you may want to consider configuring collector groups.  A Collector group allows you to group multiple remote collectors within the same data center, and the idea is that this would allow for resiliency for those remote collectors, that way when you have the vROps adapters pointed to the collector group instead of the individual remote collector, if one of the remote collectors went down the other would essentially pick up the slack and continue collecting from that data center, so there would be no data loss.  You can also create a collector group for a single remote collector for ease of expansion later if you want to add that data collection resiliency.

Go ahead and get logged into the appliance using the regular UI <https//vrops-appliance/ui/login.action>.  From here click Administration.  If you just need to configure the vSphere adapter for data collection, you can skip ahead to Section 2.  Otherwise lets continue in section 1, and configure the collector groups.

Section 1

Click on Collector Groups

vrops11

You can see that I already have collector groups created for my remote data centers, but if you were to create new, just click the green plus sign

vrops12

Give the collector group a name, and then in the lower window select the corresponding remote collector.  Then rinse-wash-and-repeat until you have the collector groups configured.  Click Save when finished.  Now lets move on to Section 2.

Section 2

From the Administration area, click on Solutions

vrops10

Now because this is your new deployment, you would only have Operating Systems / Remote Service Monitoring and VMware vSphere.  For the purpose on this post I will only cover configuring the VMware vSphere adapter.  Click it to select it, and then click the gears to the right of the green plus sign.

vrops13

Here just fill out the display name, the vCenter Server it will be connecting to, the credentials, and if you click the drop down arrow next to Advanced Settings, you will see the Collectors/Groups drop down menu.  Expand that if you have created the custom collectors in Section 1, and select the desired group.  Otherwise vROps will use the Default collector group, which is fine if you only have one data center,  otherwise I recommend at least selecting a remote collector here if you do not have a collector group configured.  This basically puts the load onto the remote collectors for data collection, and allows the cluster to focus on processing all of that lovely data.  Click Test Connection to verify connectivity, and then click save. Then rinse-wash-and-repeat until you have all vCenters collecting.  Close when finished.

Important to note that vROps by default will collect data every five minutes, and currently that is the lowest setting possible. You can monitor the status of your solutions or adapters here.  Once they start collecting their statuses will change to green.

vrops17

If you’d like to add additional solutions otherwise known as “Management PAKs”, head on over to VMware’s Solution Exchange .  I currently work for a cloud provider running NSX, so I also have the NSX and vCloud Director Management PAKs installed.  From the same solutions page, instead of clicking on the gears, click the green plus sign and add the additional solutions to your environment.  This would also be used when you are updating solutions to newer versions.  Currently there is no system to update you when a newer version is available.

vrops14

Go to Global Settings on the Administration page, where you can configure the object history, or data retention policy, along with a few other settings.

vrops15

Finally, Go back to home by clicking the house icon.  By now the Health Risk and Efficiency badges should all be colored.  Ideally green, but your results may vary.  This is the final indication that vROps is collecting.

vrops16

 

Next Post: Add The vROps License, Configuring LDAP, and The SMTP For vRealize Operations Manager (vROps)

Recent Post: Sizing and Installing The VMware vRealize Operations (vROps) Appliance

Sizing and Installing The VMware vRealize Operations (vROps) Appliance

VMware has a sizing guide that will aid you in determining how many appliances you need to deploy.  If you have multiple data centers, and somewhere north of 200 hosts, and more than 5,000 VMs, I’d recommend at least starting out with two servers configured as Large deployments.  Once you get the built in vSphere adapter collecting for each environment, you can run an audit on the environment using vROps to get the raw numbers, and expand the cluster accordingly.  Come prepared.  Walk through your environments and get a list of how many hosts, data stores, vCenters, and get a rough count of the virtual machines deployed.

KB2093783 has more details on the sizing, and I strongly urge you to visit the KB, as there are links to the latest releases of vROps, and each KB has a sizing guide attachment at the bottom, where you can input the information you collected from your environment to get a more accurate size.

_________________________________________________________________

Appliance Manual Installation

________________________________________________________________

Architectural Note

  • Before proceeding be sure you have:
    • The appropriate host resources
    • The appropriate storage
    • IP addresses assigned and entered into DNS
    • a “read-only” account configured in AD and vCenter
    • The appropriate ports opened between data centers listed in VMWare’s documentation

_________________________________________________________________

Once you have the latest edition of the vROps appliance ovf downloaded, and after consulting the documentation, use either the vSphere client or web, and deploy the OVF template.  I’ll skip through browsing for, verifying the details of, accepting the licence agreement for, and naming the appliance.

So now you’ve come to the OVF deployment step where you must select the size of your appliance.  No matter the size, the remainder of the deployment is the same, but for this example I will deploy an appliance as Large.

You can deploy the appliance in several sizing configurations depending on the size of your environment and those are: Extra Small, Small, Medium and Large.

  • Extra Small = 2 vCPUs and 8GBs of memory
  • Small = 4 vCPUs and 16GBs of memory
  • Large= 16 vCPUs and 48GBs of memory

You can also choose to deploy a remote collector and they come in two sizes:

  • Standard = 2 vCPUs and 4GBs of memory
  • Large = 4 vCPUs and 16GBs of memory

vrops1

You will notice that with each selection, VMware has given a definition of what it entails. Choose the one that best suits your needs. Click next

Storage dialog

  • Depending on the size of your environment, vrops VMs can get to over a terabyte in size each
  • Once you’ve made your selection click next
  • Architectural Note – If adding a master replica node to your vROps cluster, I’d recommend keeping the Master and Master Replicas on separate XIVs, or whatever you use to serve up storage to your environment.

Disk Format dialog

  • The default is Lazy Zeroed, and that’s how my environments have been deployed.  I’d strongly advise not using thin provision for this appliance.
  • Once you’ve made your selection click next.

Network Mapping dialog

  • Select the appropriate destination network like a management network, where it can capture traffic from your hosts, VMs, vCenters and datastores.
  • Once you’ve made your selection click next.

Properties dialog

  • Here you can set the Timezone for the appliance, and choose whether to use IPv6
  • Once you’ve filled out the network information, click next

Configuration Verification dialog

  • Read it carefully to be sure there were no fat fingers at play.  Click finish when ready.

_________________________________________________________________

Before you proceed in turning on the appliance, you may want to take the opportunity now and expand its disk.  This can be done a couple of ways.  You can expand the existing Hard Disk 2, however keep in mind that the current file system can only see disks under 2TB.  Any disk space allocated over 2TB the appliance wont be able to see.  For my production environment, I increased disk 2 to 1TB in size, and then added 500GB disks as more storage was needed.  Also keep in mind the amount of data you are going to be retaining.  My appliances are configured for 6 months, but this can be changed as needs change.  We’ll go over this later in another post. The cool thing about this appliance is that as you increase the size of disk 2, or add additional storage, the appliance during the power-on process, expands the data partition automatically.

Power up the appliance, open a console to it in vCenter to watch it boot up, and go through some scripted configurations.

vrops4

  • To get logged in, press ALT + F1 keys.  Enter root for the user, leave the password blank and hit enter.  Now you will be prompted to input the current password, so leave it blank and hit enter.  Now enter a new password, hit enter and enter the new password once more for verification.
  • Now depending on how locked down your environment is, you may not be able to but I always ping out to 8.8.8.8 along with hitting a few internal servers to verify network settings.
  • Also unless you really enjoy VMware’s console, I’d recommend running a couple commands to turn on SSH, so any future administrative tasks can be performed with a putty session.
    • The first command is:  # chkconfig sshd on
      • This enables the sshd service at system boot
    • The second command is: # service sshd start
      • This turns on the sshd service so you can connect to the box with a putty session.

_________________________________________________________________

Using Microsoft Edge, Firefox or Chrome, browse to < https ://vrops-appliance-name/ >.  This will redirect you to the Getting started page where you can choose Three options:

Express Installation, where you can set the admin password and that’s pretty much it.

vrops8

New Installation gives you a few more options to configure, like which NTP server(s) you want to use, and a TLS/SSL certificate you’ve created specifically for this system (or just use the built-in one).

vrops5

Expand An Existing Installation – this option would be used for additional data nodes or remote collectors as you’ll have the option to pick under “node type”.

vrops6

For this installation we will select New Installation.  As a rule of thumb and for better appliance performance, I’d use the NTP servers on your network that vCenter and the ESXi hosts are using to keep time in check. Once you’ve made it though the wizard click finish.

vrops9

It shouldn’t take too long for the master appliance to setup and take you to a log in screen.

You’re not done yet however. You still have to configure your cluster if you have additional data nodes, and remote collectors to add.  If you have a master replica, data nodes, or remote collector, get them connected to the master.  Each will have their own web UI  < https ://vrops-appliance-name/ >, only this time you can use the Expand An Existing Installation Option. You will also need to log into the admin section for some of this <https ://vrops-appliance-name/admin/login.action>

Lets get the master replica added first.  When you use the expand an existing cluster option, you’ll need to add it as a data node.  Then wait for the cluster to expand to it.

vrops18

Then click the finish adding new nodes button.

vrops19

To enable HA, you’ll notice in the center of the screen there is a High Availability option, but it is disabled.  Go ahead an click enable

vrops20

Now select the data node that will be the master replica, make sure enable high availability is checked, and click OK.  This part will take a little while, and the cluster services will be restarted.  After word the High Availability status will be enabled.

vrops21

Add any remaining data nodes and remote collectors using the Expand An Existing Installation Option.

_________________________________________________________________

Architectural Note

  • I’d recommend going into vCenter and adding an anti-affinity rule to keep the master and master replica on separate hosts
  • If you’ve deployed vROps to its own host cluster, I’d recommend turning down vSphere DRS to conservative.  The appliances are usually pretty busy in an active environment, and having one vmotion on you can cause cluster performance degradation, and will throw some interesting alarms within vROps.  It will recover on its own, but better to avoid when possible.

_________________________________________________________________

Next up – You”ll need to configure the built in vSphere adapter so you can start collecting data.  I’ll have more on that in my next post.

Next Post: Configuring VMware vRealize Operations Manager Adapters For Data Collection

Recent Post: What Is VMware’s vRealize Operations Manager?