Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Recent Posts

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

Use PowerShell to get a list of GPOs without Authenticated Users in the delegation

Must have seen this recent Windows update that broke GPOs which were missing the Read permission for the “Authenticated Users” group. Solution is to get a list of these GPOs and add the “Authenticated Users” group to them. Here’s a one liner that gets you such a list –

This puts it into a file called GPOs.txt in the current directory. Remove/ Modify that last re-direct as needed.

VMware client – unable to login with username, password; but able to login with “use windows credentials”

We had this weird issue at work yesterday wherein you could not login to the vCenter server by entering a username/ password, but could if you just ticked on the “Use windows session credentials” checkbox.

The issue got resolved eventually by stopping the “VMware Secure Token Service”, restarting the “VMware VirtualCenter Server” service, and then starting the “VMware Secure Token Service”. No idea why that made a difference though, and whether that actually fixed things or was just coincidental. Around the same time I had seen some VMware Tools errors so I (a) upgraded the tools, (b) moved the vCenter VM to a different host, (c) saw that one of these had caused issues with the network driver so I had to uninstall and reinstall the tools and then reset the secure channel with the domain (since when the vCenter VM came up it didn’t have network connectivity).

So it was a bit of a damper actually. Nothing more frustrating than spending a lot of time troubleshooting something and not really figuring out what the issue is. On the plus side at least the issue got sorted, but it leaves me uneasy not knowing what really went wrong and whether it will re-occur.

In the event logs there were many entries like these:

An account failed to log on.

Subject:
    Security ID:        SYSTEM
    Account Name:        VCENTER01$
    Account Domain:        MYDOMAIN
    Logon ID:        0x3e7

Logon Type:            3

Account For Which Logon Failed:
    Security ID:        NULL SID
    Account Name:        SomeAccount
    Account Domain:        MYDOMAIN.COM

Failure Information:
    Failure Reason:        Unknown user name or bad password.
    Status:            0xc000006d
    Sub Status:        0xc0000064

Process Information:
    Caller Process ID:    0xe20
    Caller Process Name:    E:\Program Files\VMware\Infrastructure\VMware\CIS\vmware-sso\VMwareIdentityMgmtService.exe

Network Information:
    Workstation Name:    VCENTER01
    Source Network Address:    –
    Source Port:        –

Detailed Authentication Information:
    Logon Process:        Advapi  
    Authentication Package:    Negotiate
    Transited Services:    –
    Package Name (NTLM only):    –
    Key Length:        0

Here’s what the error codes mean –

  • NULL SID suggests that the account that was being authenticated could not be identified
  • 0xC000006D means that authentication failed due to bad credentials
  • 0xC0000064 means that the requested user name does not exist.
  • Logon type 3 means the request was received from the network (but given the request originated from “server”, suggests that the request was looped back from itself over the network stack.

Not that it throws much light on what’s happening.

For info – this KB article lists the useful vCenter log files. I looked at the vpxd-xxxx.log file which had some entries like these –

2016-06-06T16:08:18.046+01:00 [02856 error ‘[SSO]’ opID=138a737d] [UserDirectorySso] AcquireToken exception: class SsoClient::CommunicationException(No connection could be made because the target machine actively refused it)
2016-06-06T16:08:18.046+01:00 [02856 error ‘authvpxdUser’ opID=138a737d] Failed to authenticate user <mydomain\someaccount>

This file is under C:\ProgramData\VMware\VMware VirtualCenter\Logs by the way.

I also found messages like these –

2016-06-06T10:17:59.226+01:00 [06952 error ‘[SSO]’ opID=1790eabb] [UserDirectorySso] AcquireToken exception: class SsoClient::SsoException(Failed to parse Group Identity value: `\Authentication authority asserted identity’; domain or group missing)

Two more logs I looked at are C:\ProgramData\VMware\CIS\logs\vmware-sso\vmware-sts-idmd.log and some files under C:\ProgramData\VMware\CIS\runtime\VMwareSTS\logs. In case of the latter location I just sorted by the recently modified timestamp and found some logs to look at. I focused on one called ssoAdminServer.log. This file had a few entries like these –

[2016-06-06 12:19:08,987 pool-11-thread-1  ERROR com.vmware.identity.admin.server.ims.impl.PrincipalManagementImpl] Idm client exception
com.vmware.identity.idm.IDMException: Invalid group name format for [\Authentication authority asserted identity]
    at com.vmware.identity.idm.server.ServerUtils.getRemoteException(ServerUtils.java:131)
    at com.vmware.identity.idm.server.IdentityManager.findNestedParentGroupsInternal(IdentityManager.java:4006)

I found mention of this message in a forum post which pointed to this being a known issue for vCenter installed on a 2012 server with a 2012 DC. That doesn’t apply to me.

The vSphere Web Client gives an error message “Cannot Parse Group Information” – which too is a symptom if you install vCenter on a 2012 server with a 2012 DC. Moreover it applies to vCenter 5.5 GA, which is what we are on, so all the symptoms point to that issue but it is not so in our case. :(

Back to the vmware-sts-idmd.log, that had entries like these –

2016-06-06 09:00:26,089 WARN   [ActiveDirectoryProvider] obtainDcInfo for domain [VCENTER01] failed Failed to get domain controller information for VCENTER01(dwError – 1355 – ERROR_NO_SUCH_DOMAIN)
2016-06-06 09:00:26,090 WARN   [ActiveDirectoryProvider] obtainDcInfo for domain [VCENTER01] failed Failed to get domain controller information for VCENTER01(dwError – 1355 – ERROR_NO_SUCH_DOMAIN)
2016-06-06 09:00:26,091 ERROR  [ValidateUtil] resolved group name=[\Authentication authority asserted identity] is invalid: not a valid netbios name format  
2016-06-06 09:00:26,092 INFO   [ActiveDirectoryProvider] resolved group name=[\Authentication authority asserted identity] is invalid: not a valid netbios name format  
…<snip>…
2016-06-06 09:02:53,005 INFO   [IdentityManager] Failed to find principal [SomeAccount@mydomain.tld] as FSP group in tenant [vsphere.local]
2016-06-06 09:02:53,008 INFO   [IdentityManager] Failed to find FSP user or gorup [SomeAccount@mydomain.tld]’s nested parent groups in tenant [vsphere.local]
2016-06-06 09:02:53,013 ERROR  [IdentityManager] Failed to find nested parent groups of principal [SomeAccount@mydomain.tld] in tenant [vsphere.local]
2016-06-06 09:02:53,013 ERROR  [ServerUtils] Exception ‘java.lang.IllegalStateException: Invalid group name format for [\Authentication authority asserted identity]’
java.lang.IllegalStateException: Invalid group name format for [\Authentication authority asserted identity]
    at com.vmware.identity.idm.server.provider.activedirectory.ActiveDirectoryProvider.findNestedParentGroupsByPac(ActiveDirectoryProvider.java:2140)
    at com.vmware.identity.idm.server.provider.activedirectory.ActiveDirectoryProvider.findNestedParentGroups(ActiveDirectoryProvider.java:791)
    at com.vmware.identity.idm.server.IdentityManager.findNestedParentGroupsInternal(IdentityManager.java:3985)
    at com.vmware.identity.idm.server.IdentityManager.findNestedParentGroups(IdentityManager.java:3856)
    at sun.reflect.GeneratedMethodAccessor37.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at sun.rmi.server.UnicastServerRef.dispatch(Unknown Source)
    at sun.rmi.transport.Transport$1.run(Unknown Source)
    at sun.rmi.transport.Transport$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at sun.rmi.transport.Transport.serviceCall(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport.handleMessages(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run0(Unknown Source)
    at sun.rmi.transport.tcp.TCPTransport$ConnectionHandler.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)

Again, something to do with DC/ domain … but what!? Found this blog post too that suggested the same.

For my reference, here’s a KB article listing all the SSO log files. And this is a useful blog post in case I happen upon a similar issue later (the case of the flapping VMware Secure Token Service). As is this KB article on an SSO facade error.

Solarwinds not seeing correct disk size; “Connection timeout. Job canceled by scheduler.” errors

Had this issue at work today. Notice the disk usage data below in Solarwinds –

Disk Usage

The ‘Logical Volumes’ section shows the correct info but the ‘Disk Volumes’ section shows 0 for everything.

Added to that all the Application Monitors had errors –

Timeout

I searched Google on the error message “Connection timeout. Job canceled by Scheduler.” and found this Solarwinds KB article. Corrupt performance counters seemed to be a suspect. That KB article was a bit confusing me to in that it gives three resolutions and I wasn’t sure if I am to do all three or just pick and choose. :)

Event Logs on the target server did show corrupt performance counters.

Initial Errors

I tried to get the counters via PowerShell to double check and got an error as expected –

Broken Get-Counter

Ok, so performance counter issue indeed. Since the Solarwinds KB article didn’t make much sense to me I searched for the Event ID 3001 as in the screenshot and came across a TechNet article. Solution seemed simple – open up command prompt as an admin, run the command lodctr /R. This command apparently rebuilds the performance counters from scratch based on currently registry settings adn backup INI files (that’s what the help message says). The command completed straight-forwardly too.

lodctr - 1

With this the performance counters started working via PowerShell.

Working Get-Counter

Event Logs still had some error but those were to do with the performance counters of ASP.Net and Oracle etc.

More Errors

The fix for this seemed to be a bit more involved and requires rebooting the server. I decided to skip it for now as I don’t these additional counters have much to do with Solarwinds. So I let those messages be and tried to see if Solarwinds was picking up the correct info. Initially I took a more patient approach of waiting and trying to make it poll again; then I got impatient and did things like removing the node from monitoring and adding it back (and then wait again for Solarwinds to poll it etc) but eventually it began working. Solarwinds now sees the disk space correctly and all the Application Monitors work without any errors too.

Here’s what I am guessing happened (based on that Solarwinds KB article I linked to above). The performance counters of the server got corrupt. Solarwinds uses counters to get the disk info etc. Due to this corruption the poller spent more time than usual when fetching info from the server. This resulted in the Application Monitor components not getting a chance to run as the poller had run out of time to poll the server. Thus the Application Monitors gave the timeout errors above. In reality the timeout was not from those components, it was from the corrupt performance counters.

Get a list of services and “Log On As” accounts

Wanted to find what account our NetBackup service is running under on a bunch of servers –

You have to use WMI for this coz Get-Service doesn’t show the Log On As user.

Wheee!! Had a tweet from Jeffrey Snover for this post.

 

 

Following on that tweet I noticed something odd.

The following command works –

Or this –

In the second one I am explicitly casting the arguments as an array.

But this variant doesn’t work –

That generates the following error –

Get-WmiObject : The RPC server is unavailable. (Exception from HRESULT: 0x800706BA)
At line:1 char:1
+ Get-WmiObject Win32_Service -cn $Servers -Filter ‘Name= “NetBackup Client Service …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Get-WmiObject], COMException
    + FullyQualifiedErrorId : GetWMICOMException,Microsoft.PowerShell.Commands.GetWmiObjectCommand

Get-WmiObject : The RPC server is unavailable. (Exception from HRESULT: 0x800706BA)
At line:1 char:1
+ Get-WmiObject Win32_Service -cn $Servers  -Filter ‘Name= “NetBackup Client Service …
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    + CategoryInfo          : InvalidOperation: (:) [Get-WmiObject], COMException
    + FullyQualifiedErrorId : GetWMICOMException,Microsoft.PowerShell.Commands.GetWmiObjectCommand

The error is generated for each entry in the array.

It looks like when I pass the list of servers as an array variable PowerShell uses a different way to connect to each server (PowerShell remoting/ WinRM) while if I specify the list in-line it behaves differently. I didn’t search much on this but found this Reddit thread with the same issue. Something to keep in mind …