Was stumped by a recent escalation. vSphere throws invalid credentials when logging in with a user who is part of another active directory forest.

Some background info:

  1. Both the forests are in a two way transitive trust
  2. vCenter was joined with integrated windows authentication to a primary domain in one of the forests'
  3. This is supported as per this VMware KB
  4. No DNS related issues. Forward/Reverse lookup works fine.

Logs just reported the error Native platform error [code: 851968]

[2021-09-23T03:42:14.708Z tomcat-http--3 vsphere.local a340b2e0-b5ac-47cd-8f48-12223ef8eaa1 INFO com.vmware.identity.idm.server.IdentityManager] Authentication failed for user [test@gs.lab] in tenant [vsphere.local] in [34] milliseconds with provider [corp.local] of type [com.vmware.identity.idm.server.provider.activedirectory.ActiveDirectoryProvider]
[2021-09-23T03:42:14.708Z tomcat-http--3 vsphere.local a340b2e0-b5ac-47cd-8f48-12223ef8eaa1 ERROR com.vmware.identity.idm.server.ServerUtils] Exception 'com.vmware.identity.idm.IDMLoginException: Native platform error [code: 851968][null][null]' com.vmware.identity.idm.IDMLoginException: Native platform error [code: 851968][null][null]

Validated the two-way transitive trust using the command: /opt/likewise/bin/lw-lsa get-status Trust was normal.

Enabled trace logging - more information in this VMware KB

/opt/likewise/bin/lwsm set-log-level trace
/opt/likewise/bin/lwio-set-log-info trace
/opt/likewise/bin/lwnet-set-log-level trace
/opt/likewise/bin/lw-set-log-level trace

From /var/log/messages

2021-09-23T07:00:42.105837+00:00 vcsa lsassd[1475]: 0x7fb346ffd700:[NtlmServerAcquireCredentialsHandle() ../lsass/server/ntlm/acquirecreds.c:103] Error code: 40506 (symbol: LW_ERROR_NO_CRED)
2021-09-23T07:00:42.106486+00:00 vcsa lsassd[1475]: 0x7fb369d7c700:[lwmsg_peer_log_message() ../lwmsg/src/peer-task.c:212] (assoc:0x7fb338000e00 >> 14355) CALL RES NTLM_R_GENERIC_FAILURE: 
    {
        dwError = 40506
    }

The error NTLM_R_GENERIC_FAILURE was inconclusive. So we decided to capture packets to look at the Kerberos responses that were coming back from the domain controller.

tcpdump -i eth0 -w vcsa-2309.capture

From the packet capture we were able to root cause the issue to be environmental

  • AS-REQ was sent to the domain controller from vCenter Server. An AS-REQ is an Authentication Service message which exchanges credentials for tickets.
as-req
    pvno: 5
    msg-type: krb-as-req (10)
    padata: 1 item
        PA-DATA pA-REQ-ENC-PA-REP
            padata-type: pA-REQ-ENC-PA-REP (149)
                padata-value: <MISSING>
    req-body
        Padding: 0
        kdc-options: 00000010
        cname
            name-type: kRB5-NT-PRINCIPAL (1)
            cname-string: 1 item
                CNameString: test
        realm: gs.labs
        sname
            name-type: kRB5-NT-SRV-INST (2)
            sname-string: 2 items
                SNameString: krbtgt
                SNameString: gs.labs
        till: 2021-09-24 07:03:46 (UTC)
        nonce: 158691297
        etype: 3 items
            ENCTYPE: eTYPE-AES256-CTS-HMAC-SHA1-96 (18)
            ENCTYPE: eTYPE-AES128-CTS-HMAC-SHA1-96 (17)
            ENCTYPE: eTYPE-ARCFOUR-HMAC-MD5 (23)
  • AS-REP was returned successfully by the dommain controller with the key. AS-REP is the response to the Authentication Service message.
as-rep
    pvno: 5
    msg-type: krb-as-rep (11)
    padata: 1 item
        PA-DATA pA-ETYPE-INFO2
            padata-type: pA-ETYPE-INFO2 (19)
                padata-value: XXXX
    crealm: 
    cname
        name-type: kRB5-NT-PRINCIPAL (1)
        cname-string: 1 item
            CNameString: vcsa-test
    ticket
        tkt-vno: 5
        realm: gs.labs
        sname
            name-type: kRB5-NT-SRV-INST (2)
            sname-string: 2 items
                SNameString: krbtgt
                SNameString: gs.labs
        enc-part
            etype: eTYPE-AES256-CTS-HMAC-SHA1-96 (18)
            kvno: 2
            cipher: XXX
    enc-part
        etype: eTYPE-AES256-CTS-HMAC-SHA1-96 (18)
        kvno: 5
        cipher: XXX
  • TGS-REQ was sent next from the vCenter Server. A TGS-REQ is a Ticket Granting Service Request which is similar to the Authentication Service message however TGS-REQ will contain the client ID. In the example below it is vcsa.gs.labs
tgs-req
    pvno: 5
    msg-type: krb-tgs-req (12)
    padata: 2 items
        PA-DATA pA-TGS-REQ
            padata-type: pA-TGS-REQ (1)
                padata-value: xxxx
        PA-DATA pA-FX-FAST
            padata-type: pA-FX-FAST (136)
                padata-value: xxxx
    req-body
        Padding: 0
        kdc-options: 00810000
        realm: gs.labs
        sname
            name-type: kRB5-NT-SRV-HST (3)
            sname-string: 2 items
                SNameString: host
                SNameString: vcsa.gs.labs
        till: 2021-09-23 17:03:45 (UTC)
        nonce: 1632380625
        etype: 3 items
            ENCTYPE: eTYPE-AES256-CTS-HMAC-SHA1-96 (18)
            ENCTYPE: eTYPE-AES128-CTS-HMAC-SHA1-96 (17)
            ENCTYPE: eTYPE-ARCFOUR-HMAC-MD5 (23)
  • The domain controller is supposed to reply back with a TGS-REP. However in this instance, we got an error: KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN
krb-error
    pvno: 5
    msg-type: krb-error (30)
    stime: 2021-09-23 07:03:45 (UTC)
    susec: 569859
    error-code: eRR-S-PRINCIPAL-UNKNOWN (7)
    realm: gs.labs
    sname
        name-type: kRB5-NT-SRV-HST (3)
        sname-string: 2 items
            SNameString: host
            SNameString: vcsa.gs.labs
  • The KRB5KDC_ERR_S_PRINCIPAL_UNKNOWN will need to be further diagnosed from the domain controller. Why is the domain controller saying the principal is unknown when the vCenter server has good membership with a domain in the trusted forest?