vSphere HA configuration failures has been a regular issue as long as I can remember - usually a result of environmental issues. Recently I came across an issue where vSphere HA will not configure after upgrading to vCenter 7.0u2d

HA configuration will fail with the error Setting desired image spec for cluster failed

From the vpxd logs, the wrong version of the vib 7.0.2-18455215 was being pushed to the hosts. The vib that had to be pushed to the hosts should be 7.0.2-18455184

2021-09-29T14:10:55.802+10:00 error vpxd[30808] [Originator@6876 sub=DAS opID=ku3nmwor-192033-auto-446a-h5:70029211-63-01] Encountered error(com.vmware.vapi.std.errors.not_found) while retrieving image spec for solution com.vmware.vsphere-ha
2021-09-29T14:10:55.802+10:00 error vpxd[30808] [Originator@6876 sub=DAS opID=ku3nmwor-192033-auto-446a-h5:70029211-63-01] HA solution is not present in the image spec
2021-09-29T14:10:55.802+10:00 info vpxd[30808] [Originator@6876 sub=DAS opID=ku3nmwor-192033-auto-446a-h5:70029211-63-01] Updating image spec to version 7.0.2-18455215 for solution com.vmware.vsphere-ha in cluster domain-c2003

To resolve this issue, we reset the Lifecycle manager(update manager) database by running the below command and restarting the management agents.

/usr/lib/vmware-updatemgr/bin/updatemgr-utility.py reset-db

More info on resetting the vum database can be found here

The reason I wanted to write this post was to keep a track of events in the logs that we can use to troubleshoot HA configuration failures: You can use the below exerpts from a succesfull HA configuration to filter events and troubleshoot HA failures. The below excerpts can be used with vRealize Log Insight as well!

vCenter Server logs - /var/log/vmware/vpxd/vpxd.log

  1. Begin task - The key things to keep a track of are the opID kpo0kwu5-9142709-auto-5fyk6-h5, task id task-679499 and the task vim.ComputeResource.reconfigureEx. domain-c4727 is the vSphere cluster mob id.
2021-10-04T23:02:15.761Z info vpxd[06614] [Originator@6876 sub=vpxLro opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9] [VpxLRO] -- BEGIN task-679499 -- domain-c4727 -- vim.ComputeResource.reconfigureEx -- 52fb1030-3448-9fca-5788-c6e3c1c03ed1(5257f69b-6da4-2893-f1a0-91a6dfed6bd0)
  1. The current HA State - This indicates that the HA state is in a transitioning state HA disabled -> uninitialized
2021-10-04T23:02:15.858Z info vpxd[06614] [Originator@6876 sub=MoHost opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9] VC state for host host-4733 (HA disabled -> uninitialized), FDM state (UNKNOWN_FDM_HSTATE -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
2021-10-04T23:02:15.860Z info vpxd[06614] [Originator@6876 sub=MoHost opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9] VC state for host host-4762 (HA disabled -> uninitialized), FDM state (UNKNOWN_FDM_HSTATE -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
2021-10-04T23:02:15.861Z info vpxd[06614] [Originator@6876 sub=MoHost opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9] VC state for host host-4741 (HA disabled -> uninitialized), FDM state (UNKNOWN_FDM_HSTATE -> UNKNOWN_FDM_HSTATE), src of state (null -> null)
  1. Begin HA configuration on the ESXi host - DasConfig.ConfigureHost
2021-10-04T23:02:15.886Z info vpxd[07052] [Originator@6876 sub=vpxLro opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-01] [VpxLRO] -- BEGIN task-679500 -- vesxi02.gs.labs -- DasConfig.ConfigureHost -- 
2021-10-04T23:02:15.889Z info vpxd[63822] [Originator@6876 sub=vpxLro opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02] [VpxLRO] -- BEGIN task-679501 -- vesxi03.gs.labs -- DasConfig.ConfigureHost -- 
2021-10-04T23:02:15.891Z info vpxd[63703] [Originator@6876 sub=vpxLro opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-03] [VpxLRO] -- BEGIN task-679502 -- vesxi01.gs.labs -- DasConfig.ConfigureHost -- 
  1. Start FDM Service. This will fail if the fdm vib is not already installed on the host - `VpxdDas::StartFdmService``
2021-10-04T23:02:16.947Z error vpxd[63822] [Originator@6876 sub=DAS opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02] [VpxdDas::StartFdmService] Failed to start FDM service on host vesxi03.gs.labs:N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
2021-10-04T23:02:16.950Z error vpxd[63703] [Originator@6876 sub=DAS opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-03] [VpxdDas::StartFdmService] Failed to start FDM service on host vesxi01.gs.labs:N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
2021-10-04T23:02:17.529Z error vpxd[07052] [Originator@6876 sub=DAS opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-01] [VpxdDas::StartFdmService] Failed to start FDM service on host vesxi02.gs.labs:N3Vim5Fault8NotFound9ExceptionE(Fault cause: vim.fault.NotFound
  1. Initial SOAP calls to the esxi hosts in the cluster - Creating SOAP stub adapter for /sdk
2021-10-04T23:02:19.045Z info vpxd[63703] [Originator@6876 sub=Vmomi opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-03] Creating SOAP stub adapter for /sdk on vesxi01.gs.labs:443
2021-10-04T23:02:19.589Z info vpxd[07052] [Originator@6876 sub=Vmomi opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-01] Creating SOAP stub adapter for /sdk on vesxi02.gs.labs:443
2021-10-04T23:02:19.589Z info vpxd[07052] [Originator@6876 sub=Vmomi opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-01] Creating SOAP stub adapter for /sdk on vesxi02.gs.labs:443

2021-10-04T23:03:14.403Z info vpxd[63822] [Originator@6876 sub=Vmomi opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02-01] Creating SOAP stub adapter for /fdm on vesxi03.gs.labs:443

  1. Push FDM config to ESXi hosts - vpxdDasConfig::PushConfigToFDM
2021-10-04T23:03:15.536Z info vpxd[63822] [Originator@6876 sub=DAS opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02-01] [VpxdDasConfig::PushConfigToFDM] pushed config version 4  to host [vim.HostSystem:host-4762,vesxi03.gs.labs] (cluster [vim.ClusterComputeResource:domain-c4727,VAPP01])
2021-10-04T23:03:22.765Z info vpxd[63703] [Originator@6876 sub=DAS opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-03-01] [VpxdDasConfig::PushConfigToFDM] pushed config version 4  to host [vim.HostSystem:host-4733,vesxi01.gs.labs] (cluster [vim.ClusterComputeResource:domain-c4727,VAPP01])
2021-10-04T23:03:26.354Z info vpxd[07052] [Originator@6876 sub=DAS opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-01-01] [VpxdDasConfig::PushConfigToFDM] pushed config version 4  to host [vim.HostSystem:host-4741,vesxi02.gs.labs] (cluster [vim.ClusterComputeResource:domain-c4727,VAPP01])
  1. Task completion - FINISH task-679499
2021-10-04T23:03:41.780Z info vpxd[06614] [Originator@6876 sub=vpxLro opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9] [VpxLRO] -- FINISH task-679499

Logs on the ESXi host - /var/log/vpxa.log

  1. Check for existing FDM vib fails. - Use the same opID from vCenter Servers’s vpxd.log - kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02
2021-10-04T23:02:21.528Z info vpxa[1052276] [Originator@6876 sub=Default opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02-e7] [VpxLRO] -- ERROR lro-1728 -- serviceSystem -- vim.host.ServiceSystem.updatePolicy: vim.fault.NotFound:
--> Result:
--> (vim.fault.NotFound) {
-->    faultCause = (vmodl.MethodFault) null,
-->    faultMessage = <unset>
-->    msg = "Received SOAP response fault from [<cs p:00000054786245f0, TCP:localhost:8307>]: updatePolicy
--> The object or item referred to could not be found."
--> }
--> Args:
-->
--> Arg id:
--> "vmware-fdm"
--> Arg policy:
--> "on"
  1. FDM vib install - vim.host.PatchManager.InstallV2
2021-10-04T23:02:29.198Z info vpxa[1052257] [Originator@6876 sub=vpxLro opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02-0] [VpxLRO] -- BEGIN task-602 -- patchManager -- vim.host.PatchManager.InstallV2 -- 52726c71-9ea1-609f-0a0c-7de8ad3517a9

021-10-04T23:02:38.520Z info vpxa[1052270] [Originator@6876 sub=vpxaInvtHost opID=WFU-74b43f93 update=3608] Increment master gen. no to (715): HostChanged|configManager.firewallSystem:firewallInfo.ruleset["fdm"]
2021-10-04T23:02:39.545Z info vpxa[1052683] [Originator@6876 sub=vpxaInvtHost opID=WFU-71c44e94 update=3611] Increment master gen. no to (716): HostChanged|configManager.serviceSystem:serviceInfo.service["vmware-fdm"]
  1. FDM is added to Autostart - Reconfigure|configManager.autoStartManager:config
2021-10-04T23:03:44.251Z info vpxa[1052267] [Originator@6876 sub=vpxLro opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02-01-3d] [VpxLRO] -- BEGIN lro-1761 -- autoStartManager -- vim.host.AutoStartManager.reconfigure -- 52726c71-9e
a1-609f-0a0c-7de8ad3517a9
2021-10-04T23:03:44.257Z info vpxa[1052267] [Originator@6876 sub=vpxaInvtHost opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02-01-3d] Increment master gen. no to (729): Reconfigure|configManager.autoStartManager:config
  1. FDM is running - configManager.serviceSystem:serviceInfo.service["vmware-fdm"].running
2021-10-04T23:03:16.939Z info vpxa[1052264] [Originator@6876 sub=vpxLro opID=kpo0kwu5-9142709-auto-5fyk6-h5:70331855-e9-02-01-a6] [VpxLRO] -- BEGIN lro-1748 -- serviceSystem -- vim.host.ServiceSystem.start -- 52726c71-9ea1-609f-0a0c-7de8ad3517a9
2021-10-04T23:03:18.537Z info vpxa[1052270] [Originator@6876 sub=vpxaInvtHost opID=WFU-77b4bdcc update=3641] Increment master gen. no to (723): HostChanged|configManager.serviceSystem:serviceInfo.service["vmware-fdm"].running

Logs on the ESXi host - /var/log/esxupdate.log

  1. vmware-fdm VIB added for install
2021-10-04T23:02:30Z esxupdate: 1088852: esxupdate: INFO: --- Command: update Args: ['update'] Options: {'viburls': ['file:///tmp/VMware_bootbank_vmware-fdm_7.0.2-17958471.vib'], 'meta': None, 'hamode': True, 'proxyurl': None, 'timeout': 30.0, 'retry': 5, 'loglevel': None, 'cachesize': None, 'cleancache': None, 'maintenancemode': None, 'nosigcheck': True}

2021-10-04T23:02:30Z esxupdate: 1088852: Transaction: INFO: Final list of VIBs being installed: VMware_bootbank_vmware-fdm_7.0.2-17958471
2021-10-04T23:02:30Z esxupdate: 1088852: imageprofile: INFO: Adding VIB VMware_bootbank_vmware-fdm_7.0.2-17958471 to ImageProfile ESXi-7.0.2-17630552-standard
  1. VIB download - vmware-fdm
2021-10-04T23:02:32Z esxupdate: 1088852: HostImage: INFO: Attempting to download VIB vmware-fdm
  1. VIB Live install
2021-10-04T23:02:36Z esxupdate: 1088852: LiveImageInstaller: DEBUG: Starting to enable VIBs: VMware_bootbank_vmware-fdm_7.0.2-17958471
2021-10-04T23:02:36Z esxupdate: 1088852: LiveImageInstaller: DEBUG: Live installing vmware-fdm-7.0.2-17958471
  1. Install complete.
2021-10-04T23:02:37Z esxupdate: 1088852: LiveImageInstaller: INFO: Starting service /etc/init.d/vmware-fdm...
2021-10-04T23:02:37Z esxupdate: 1088852: vmware.runcommand: INFO: runcommand called with: args = '['/etc/init.d/vmware-fdm', 'start', 'install']', outfile = 'None', returnoutput = 'True', timeout = '0.0'.
2021-10-04T23:02:37Z esxupdate: 1088852: LiveImageInstaller: DEBUG: Output: Not starting vmware-fdm now (install). Will be started separately. success

You can also use /var/log/fdm.log to further review events as well!