Contact

Subscribe via Email

Subscribe via RSS

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Useful offline Windows troubleshooting/ fixing tricks

Had a Windows Server 2008 R2 server that started giving a blank screen since the recent Windows update reboot. This was a VM and it was the same result via VMware console or RDP. Safe Mode didn’t help either. Bummer!

Since this is a VM I mounted its disk on another 2008 R2 VM and tried to fix the problem offline. Most of my attempts didn’t help but I thought of posting them here for reference. 

Note: In the following examples the broken VM’s disk is mounted to F: drive. 

Recent updates

I used dism to list recent updates and remove them. To list updates from this month (March 2017):

To remove an update:

I did this for each of the updates I had. That didn’t help though. And oddly I found that one of the updates kept re-appearing with a slightly different name (a different number suffixed to it actually) each time I’d remove it. Not sure why that was the case but I saw that F:\Windows\SxS had a file called pending.xml and figured this must be doing something to stop the update from being removed. I couldn’t delete the file in-spite of taking ownership and full control, so I opened it in Notepad and cleared all the contents. :o) After that the updates didn’t return but the machine was still broken. 

SFC

I used sfc to check the integrity of all the system files:

No luck with that either!

Event Logs

Maybe the Event Logs have something? These can be found at F:\Windows\System32\Winevt\Logs. Double click the ones of interest to view. 

In my case the Event Logs had nothing! No record at all of the VM starting up or what was causing it to hang. Tough luck!

Bonus info: HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Eventlog contains locations of the files backing the Event Logs. Just mentioning it here as I came across this.

Drivers

Could drivers cause any issue? Unlikely. You can’t use dism to query drivers as above but you can check via registry. See this post. Honestly, I didn’t read it much. I didn’t suspect drivers and it seemed too much work fiddling through registry keys and folders. 

Last Known Good Configuration

Whenever I’d boot up the VM I never got the Last Known Good (LKG) Configuration option. I tried pressing F8 a couple of times but it had no effect. So I wondered if I could tweak this via the registry. Turns out I can. And turns out I already knew this just that I had forgotten!

Your current configuration is HKLM\System\CurrentControlSet. This is actually a link to HKLM\System\CurrentControlSet01 or HKLM\System\CurrentControlSet02 or HKLM\System\CurrentControlSet03 or … (you get the point). Each of the CurrentControlSetXXX key is one of your previous configurations. The one that’s actually used can be found via HKLM\System\Select. The entry Current points to the number of the CurrentControlSetXXX key in use. The entry LastKnownGood points to the Last Known Good Configuration. Now we know what to do. 

  1. Mount the HKLM\SYSTEM hive of the broken VM. All registry hives can be found under %windir%\System32\Config. In my case that translates to the file F:\Windows\System32\Config\SYSTEM.
  2. To mount this file open Registry Editor, select the HKLM hive, and go to File > Load Hive. (This is a good post with screenshots etc).  
  3. Go to the Select key above. Change Current to whatever LastKnownGood was. 
  4. That’s all. Now unload the hive and you are done.

This helped in my case! I was finally able to move past the blank screen and get a login prompt. Upon login I was also able to download and install all the patches and confirm that the VM is now working fine (took a snapshot of course, just in case!). I have no idea what went wrong, but at least I have the pleasure of being able to fix it. From the post I link to below, I’d say it looks like a registry hive corruption. 

Since I successfully logged in, my machine’s Last Known Good Configuration will be automatically updated by Windows with the current one. Here’s a blog post that explains this in more detail. 

That’s all! Hope this helps someone. 

Active Directory: Troubleshooting Domain Controller critical services

These are notes from the AD Troubleshooting WorkshopPLUS session I attended. The notes are on troubleshooting Domain Controller critical services. I am mostly following what was discussed in class here rather than add anything new (except in the section of SC where I talk a bit about it).

Before moving on let’s recap the DC critical services from my previous post:

  • DHCP client / DNS client – registers the DCs A and PTR records
    • DHCP client for Server 2003 and prior
    • DNS client for Server 2008 and later
  • FRS / DFSR – responsible for SYSVOL replication between DCs
    • FRS is now deprecated, may or may not be used in the domain. DFSR is the replacement.
    • If the domain was born in functional level 2008 (i.e. all DCs are Server 2008 or later) then DFRS is used.
    • Else FRS could be in use unless it was migrated.  
  • DNS server – used by DCs to locate each other, clients to locate DCs
  • KDC – used for Kerberos authentication in the domain
  • Netlogon – maintains secure channel between DCs and other DCs and clients; also updates DNS with the SRV records
    • Secure channel is used for Kerberos authentication and AD replication
    • DNS records are also written to %systemroot%\system32\config\Netlogon.DNS in case manual updating of DNS server is required.
  • Windows Time – maintains correct time in the domain, required for Kerberos authentication and AD replication
  • AD DS – provides AD
  • AD WDS – provides a web interface to AD

Event Viewer

In case of issues the Event Viewer is the best place to start troubleshooting from. Bear in mind merely looking at the System and Application logs as most admins do is not enough. AD specific events are usually logged under the Custom Views > Server Roles section. 

ad-events

Event IDs for some of the common problems can be found at this link. Some more event IDs and their resolution can be found at this link. The previous two links are worth a read in that they also give a high level overview of AD and troubleshooting.  

DcDiag

This has a separate post of its own now.

Service Controller (SC)

This is a command I haven’t used much except in the context of checking for drivers. Try the following if you want to get a list of all active drivers on your system:

Omit the pipe and findstr after that if you want more details. SC is cool in that it can do remote computers too:

But drivers are just one type of objects SC can query. If you omit the type= driver SC returns services (and if you set type= All SC returns both drivers and services).

For example, to get a list of all services on the machine

An example entry in the output looks like this:

Too much info, so to output just the Service Name, Display Name, and State use findstr:

Services can be stopped and started using the following commands:

 

SC has its limitations though, in that you can’t stop a service if it has other services dependent on it. To my knowledge SC doesn’t have a way of enumerate services that depend on a particular service either, so there’s no way to manually stop all those services via a batch file or something. That said, SC can find which services a particular service depends upon via the sc qc command. For example:

Given a service you can also get its description. For example:

Like I said, I don’t use SC much except to query drivers. What I typically use for querying services is PowerShell.

PowerShell

  • Start-Service
  • Stop-Service
  • Restart-Service
  • Get-Service

I have noticed that sometimes the results from Get-Service and sc query vary. A recent example was when I did Get-Service NTDS on a Server 2008 R2 machine and it returned nothing while sc query NTDS returned results as expected.

Even WMIC is able to find NTDS above, but Get-Service doesn’t. Go figure!

Be mindful of the symptoms

One thing that was emphasized in class a lot is that while troubleshooting start with the symptoms (doh!). As in, think of the symptoms you are experiencing and work backwards from them as to what critical services could be down/ broken which might be leading to these symptoms. That will give you a good starting point to troubleshoot and then you can use the tools above to dig deeper and identify the problem. AD is a complex system made up of many moving parts, so a good understanding of the underlying structure and how they tie in together is important.

Enabling disabled Office add-ins automatically

So as I mentioned yesterday I have been thinking of using Task Scheduler to try and enable disabled Office add-ins automatically.

The Problem

Here’s what happens. Every so often when users open Word, Outlook, Excel, etc they get a prompt like the one below:

addin-error

We don’t know why they keep getting it, but ideally users should be clicking “NO” as we use all these add-ins and don’t want them disabled. But users being users they click “YES” intentionally or inadvertently and then call IT with complaints that their Word, Outlook, etc aren’t working as expected. Then we in IT have to do the boring task of going to their add-ins screen and enabling these items; followed by closing and opening Word, Outlook, whatever.

What I want to do here is (1) warn users when the above dialog box comes up, advising them not to click the “YES” button; and (2) if they do click “YES” then somehow enable the add-ins behind their back and get them to close and open the affected program.

Add-ins and the registry

I started off thinking I might need to do some fancy stuff to enable add-ins, but soon realized that they are all controlled by the registry. Nice!

  • Add-ins can be installed for all users on the machine or just a specific user. So you have to look under both HKLM and HKCU. The two locations are: addin-registryHKCU\Software\Microsoft\Office\(application)\Addins\ and HKLM\Software\Microsoft\Office\(application)\Addins\ (replace (application) with Word, Outlook, Excel, or PowerPoint). These two locations contain keys for each add-in installed for that particular application.Here’s a screen shot for the Outlook InterAction add-in for a user.
  • The key thing in the registry keys above is the LoadBehavior entry. This can take many values – the default is a value of 3 which means the add-in is loaded automatically at startup and is currently loaded. If the value is 2 it means the add-in is loaded automatically at startup but is currently not loaded.
  • My initial hunch was that LoadBehavior is what I must target. I thought when a user disables an add-in the LoadBehavior value of that add-in probably changes to 2, so all I need to do is switch it back to 3. But I was wrong (as I quickly figured after crashing Word a couple of times and choosing to disable add-ins).
  • Then I discovered that add-ins which are disabled via a user dialog box as above don’t have their LoadBehavior value changed; rather they are set as disabled at another place in the registry. And that is under the HKCU hive, at HKCU\Software\Microsoft\Office\(version)\(application)\Resiliency\DisabledItems (replace (version) with 15.0 for Office 2013, 14.0 for Office 2010, 12.0 for Office 2007, 11.0 for Office 2003, you get the idea … and replace (application) with Word, Outlook, etc as before).

Each time an add-in is disabled, a REG_BINARY entry is created under HKCU\Software\Microsoft\Office\(version)\(application)\Resiliency\DisabledItems with a binary value. I didn’t experiment to see if there’s a one-to-one mapping between this entry and an add-in, and also whether the entry is same for an add-in across all machines; the important finding for me was that if I delete this entry, it enables the add-in. So all I really needed to do to see if there are disabled add-ins for a program is to just check this key, and if there are any entries I can delete them all to re-enable all the add-ins. Nice! This doesn’t require a PowerShell script really. Good old reg can do the trick too. So a batch file is more than sufficient.

Below is what I came up with:

I could probably make it shorter if I pick up more batch file scripting but this does the trick for me.

Let’s break it down for Outlook. I enumerate the registry key. The results of this are iterated over by a FOR loop. If the list isn’t empty – i.e. there are disabled add-ins – the loop exits and goes to a different section.

This section that the loop exits to simply deletes all entries under that key, shows a message to the user asking them to close and open Outlook, and goes up the batch file to check Word next.

Repeat and rinse for each application and there you have it! Like I said, easy peasy, quick and dirty!

Maybe I should improve the script later – I don’t know. An overhaul using PowerShell might be a good idea I think. Specifically, there are a few things I am already thinking of adding:

  1. Currently the message box goes away if the user clicks OK. Would be nice if I could keep an eye out for whether they actually close the program later, and if not keep popping up reminders – maybe a ballon tip?
  2. Would be nice if I could close the application for the user. I could do that in the batch file too via taskkill but I was wary of doing that lest it cause any corruptions. I wouldn’t want Outlook OST and PST files or Word templates getting corrupted because I closed the program. Maybe PowerShell has a more graceful way of shutting down an app? I don’t think so, but that could be my ignorance.
  3. Currently I enable all add-ins. While that’s fine for our case maybe I should have a list of add-ins that are to be excluded from being enabled? Or only enable a whitelist of add-ins?

So that’s that. Next step is to find a way to trigger the batch file.

Scheduled task triggers and Event Viewer

Each time the dialog box asking a user to disable an add-in appears, an entry is logged in Event Viewer. This looked like a promising way to hook on to that event and warn the user not to click “YES”. These entries are logged under “Application and Services Logs” > “Microsoft Office Alerts”.

addin-ev-alert

The joy of discovering that was short lived though, as I realized the same sort of alerts are generated for many other Office related things: spell check notifications, connectivity to Exchange, etc.

misc-ev-alert

All these alerts had the same source and ID and there was nothing for me to distinguish between them! Sure the data part was different, and while Event Viewer (and thus Task Scheduler) supports XPath for searching the Event Logs it has limitations in that I can’t do wildcard searches. Thus while I could do an XPath search for all events that have the exact error message in the first Event Log entry above, I can’t do an XPath search for all events that contain the text “Do you want to disable this add-in?”. This is a big limitation because now I will have to (a) find the exact message for each add-in and program, and (b) create XPath queries for each of these – eugh! That’s too much work for now (and not very elegant either!) so I decided to not pursue that route.

Once a user decides to enable or disable an add-in though, I found a different alert is generated for that. Under the “Application” log, from source “Microsoft Office 14” (since we have Office 2010), alerts with event ID 2001 are generated each time a user selects to disable an add-in (and alerts with event ID 2000 are generated when they choose not to disable).

2001-alert

That’s convenient, coz now I can create a task that is triggered on such events and whose action is to run the previous batch file. Nice!

Putting it all together

End result is a task I can import as in the case of the laptop lid issue. Here’s the XML file of the task:

Once again, I install it for the users I want via a batch file like this:

Not done yet!

I am not done yet though. It’s not practical installing it like this for each user in my domain. I chose the above approach just to get it installed for a few test users; next step is to roll it out via GPO to the whole domain. Stay tuned for a post on that later …!

And one more thing …

Something I learnt while reading about add-ins.

Non admin accounts keep losing their delegated rights to admin accounts (part 1)

Here’s something that I learnt the other day.

At work we use a bunch of admin accounts for various tasks. Previously all these admin accounts were part of the Domain Admins group, but recently in a drive to tighten down things we removed many of these accounts the from Domain Admins group. These accounts are still members of the other built in groups such as Account Operators and/ or Server Operators though, but not Domain Admins.

After removal we noticed that the accounts that were not Domain Admins could no longer reset passwords or unlock accounts for any admin accounts. Not surprising – since the Domain Admins group is what has such rights on all accounts once these users are removed from the Domain Admins group they naturally lost their rights. This needed fixing so here’s what we did: all our admin accounts (both Domain Admins and others) were in an OU called “Admin Accounts”, so we put the accounts that were not in the Domain Admins group into a group called Limited Admins and delegated this group rights to reset passwords on the “Admin Accounts” OU.

delegate

rights

Notice the Limited Admins group has a reset password Access Control Entry (ACE) on the “Admin Accounts” OU. This is a result of the delegation. If I check an individual account in this OU, the ACE entry is present on it too.

rights2

Once this was done and dusted a funny thing happened. Initially the Local Admin groups members could reset everyone’s passwords but soon they complained they were unable to. We checked the OU and an example Domain Admin account and noticed the previous ACE was no longer present. The ACE was still present on the OU and on accounts that were not members of the Domain Admins group, but it was missing from accounts that were members of the Domain Admins group or even groups such as Account Operators and Server Operators. Very odd!

We checked whether any of the other admins were removing these rights intentionally but none were. Next we checked the Event Viewer but that didn’t have anything to add. Finally we enabled auditing of account management activities to see if that sheds some light. An important point (which I had missed out initially) is that to view the extra details one must check the Event Viewer of the Domain Controller with the PDC Emulator role. To find out the DC with the PDC Emulator role open “AD Users and Computers”, right click on the domain name, select Operations Master:

pdcemul

Sure enough when we went through the Event Viewer of this DC there was an entry which explained what was happening:

event 4780

Interesting! At least this explains what was happening. And now that we knew what was happening the next step was to read more about the AdminSDHolder object and tweak things so our accounts didn’t get their ACEs stripped. This post took longer than expected to type up, so more on that in my next post …

Using Get-WinEvent to look at Windows event logs

Playing around with Get-WinEvent today. I find it very useful, especially when dealing with remote computers (as I have to at work). Launching Event Viewer, connecting to a remote computer (or even local computer), and then sifting through logs (or creating filters to sift) seems very cumbersome when I can acheive the same results much faster via PowerShell.

As you might have seen the Event Viewer has various logs. Using Get-WinEvent you can select which logs to focus on. To get a list of available logs do the following:

Probably better to filter through format-table for neater output:

To view details of a specific log, replace * with the name (and pipe output to format-list to view all the details):

To view all events in a log do the following:

There’s a lot of output, so good to restrict the number of entries:

Strangely there’s no easy way to restrict the entries starting at a certain time. There is, but no easy switchy way. Instead you have to do the following:

We use the -FilterHashTable switch here. This takes a hash table containing the log name as well as other parameters to filter on (you can start time, end time, provider name, ID, etc, and can combine multiple parameters). In the example above I want all events from the “System” log for the last 2 hours – so I use the get-date cmdlet, use it’s method AddHours() and set the number of hours to be added as -2.

In the next example I filter all events from the “System” log with event ID 7036 starting from now yesterday up to an hour ago.

To change the displayed order of events from newest-first to oldest-first use the -Oldest switch.

Beats piping the output to a sort cmdlet to reverse-sort!

Similar to how you can list the lognames you can list providers too:

This gives a list of the providers (the applications or sources that log events) and the logs to which they send events. The list can be huge, so a good idea is to pipe the output through where-object to filter by the providers you are interested in:

Note: The case doesn’t really matter above (“PowerShell”) because the -match operator is case in-sensitive by default. Use -cmatch if you care about case.

Back to the -FormatHashTable switch: this one takes a “ProviderName” key too containing the name of the provider you wish to filter on. The name can contain wildcards if you want to search multiple providers. For instance, the example below searches the “Application” log for all entries by the Citrix providers (there are many hence the wildcard) logged in the past three hours:

Note: In the example above, I can omit the logname. The hashtable requires either the log name or the provider name (or the path to a log). If you don’t care about the log name or provider name, mention one of these but leave the value as a wildcard. Like thus:

The above example gets all logs from the past 3 hours.

The default output of Get-WinEvent includes a lot of fields. Best to use format-table or select-object to only show what you want. In the example below I use select-object to select just the Message, ID, and TimeCreated properties. Further I pipe the output to a CSV file (doing that just to show how easy it is to quickly pull some remote logs into a CSV file):

Taking PowerShell on a date

I needed to filter the event logs from a remote computer for power off events. The user was powering it off himself every now and then, but wouldn’t believe me when I said so. No problem, thanks to the event logs all his shutdown requests are logged so it’s only a matter of culling it out.

I could have user Event Viewer but PowerShell is way easier. I only need to check the System event logs for any events with ID 1074. Further, for every shutdown, events with this ID are logged twice – one starting with the line “The process C:Windowssystem32winlogon.exe …” the other starting with the line “The process Explorer.EXE has initiated the power off …” – so I’d like to filter the output to just include one of these lines. Finally, I’d also like the day of the week to be shown.

So here’s what I did:

So far so good. I use Get-WinEvent to return the event logs I want, use the Message property of these returned objects to filter logs containing just the text we want. And finally I use the TimeCreated property of these objects to display the time and day. This TimeCreated property is the subject of this post so let’s focus on that further.

First, how do we get the members and properties of the objects returned by Get-WinEvent? Here’s how:

Notice the TimeCreated property.

The TimeCreated property is an object of class DateTime. To get the day of the date instance, use the TimeCreated.DayofWeek property. Which is what I extract via the Format-Table expression in my initial code:

This post from the “Hey, Scripting Guy! Blog” is a good read on Get-WinEvent and the DateTime class.

Focussing on the DateTime class, this is the type of the object returned by the get-date cmdlet too. Although not obvious from the syntax of the get-date cmdlet, you can use it to return a DateTime object instance of the date you specify as string. Like thus:

What’s more, you can even format the string:

The -Format switch takes a format specifier. As seen above “dd” stands for the date, “ddd” stands for the day in short-hand, “dddd” stands for the date in longhand, and so on.

You can also typecast a string to the DateTime class:

The typecast technique has a limitation though in that it expects the input to be in the format of US, i.e. “MM/dd/YYYY”. The DateTime class has a static member called Parse(), however, that is smart enough the use the country/ culture of the computer.

How do we parse the date if it’s in a different format though? Both the get-date cmdlet and the [datetime]::Parse() method fail if the input string does not match the date format of the computer:

It is possible to specify the date format of the input string. This only works with the Parse() method though. Like thus (thanks to this blog post):

The Globalization.cultureinfo class has a GetCultureInfo() method, so what we are doing above is we create a variable that is an instance of the “en-US” culture and then pass this variable to the Parse() method so it knows which format to parse the input in. Of course we could have also specified the whole thing on one line:

Let’s examine the Globalization.cultureinfo class. First, let’s examine its static members and properties:

The GetCulture() method can be used to list all available cultures:

To get details about a specific culture use the GetCultureInfo() method:

The LCID corresponds to the Locale ID assigned by Microsoft. To see all the properties use the get-member cmdlet or format-list to see the values:

On the topic of dates, the DateTimeFormat property seems interesting:

Notice the ShortDatePattern above. This is exactly what the -Format switch of Get-Date cmdlet expects, so one can use this to format the output of Get-Date in a different locale. Like thus:

So now we have seen to use the [datetime]::Parse() method to parse an input string into a DateTime object while specifying the country format of the input string. We have also seen how to output the result of get-date in the format of a different country. Both methods use the [Globalization.cultureinfo] class.

Lastly, there is an easy way to parse any string – irrespective of whether it matches the format of a country or not – into a DateTime object via the ParseExact() method. This is worth mentioning as without such a method one would have to resort to regular expressions and such!

In the example below we match an input string with time “02:50PM” by specifying the format as “hh:mmtt” to the ParseExact() method.

I am not sure what the third parameter of this method is supposed to be. Most examples put it as $null. It seems to take a “culture” as input, but I am not sure how it affects the output. Instead of $null the following too seem to work: $(get-culture), [Globalization.Cultureinfo]::GetCultureInfo("en-US") (“en-GB” too works), and [Globalization.Cultureinfo]::InvariantCulture.

The format can contain other text too. Escape the characters of such text with a slash or put the text within quotes ':

And just to show what happens if the format does not match:

Thanks to these two posts for showing me ParseExact().

Find the computer from where an AD account is locked out

I want to find out where from a user account is locked out in my domain.

The manual way to do this would be to open up Event Viewer, scan the event logs on the DC for event ID 4740, open it up and see the message to identify the machine from where this account was locked out. But using PowerShell we can obviously automate this way easily!

First things first: how to scan the event logs using PowerShell? You can use either of Get-EventLog or Get-WinEvent. The former only works on the classic event logs while the latter works on both the classic and newer event logs such as those found in Windows Vista and later. I find the latter faster too, so let’s use that. Here’s me running that cmdlet to filter the security logs for event ID 4740 on a remote DC:

There’s only one event here so the details of that are returned. It gives us all the info we need at a glance: user name “User01” is locked thanks to a bad password attempt from the computer “MANGO”.

Neat!

Now let’s go about making this prettier and better.

In the code above I am specifying the domain controller to query. That’s only helpful if I have an idea of the machine or site the user is trying to login to, and so know the domain controller responsible for that site. A better approach would be to query the domain controller with the PDC Emulator FSMO role for event ID 4740. This DC is the one that eventually processes the account lockout requests for all accounts in the domain.

The cmdlet Get-ADDomainContoller gives you the domain controller of the site you are in. Adding the switch -Filter * gives you all the domain controllers in the domain. And piping this through a where-object to filter out the domain controller with the “PDC Emulator” FSMO role gives you the domain controller we are interested in. Like thus:

The snippet above returns an object for the DC with the PDC Emulator role, to extract just the name do thus:

Next, let’s go about presenting the output above in a tidier format.

One thing to note is that all the info we want is in a property called Message.

So one option would be to do some fancy filtering using foreach-object like thus:

This will list out the multiple lockout entries with the timestamp on top in red.

A better option, however, is to use the Properties property, which is an array containing exactly the info we want as values.

Format-Table is your friend here:

A one line code that’s way better than opening Event Viewer and filtering through the logs manually.

And if you want to filter by the locked out status of a specific $user, pipe the output through a where-object like thus: