Contact

Subscribe via Email

Subscribe via RSS

Categories

Recent Posts

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Create multiple DNS records using PowerShell

I had to create multiple A DNS records of the format below –

Just 9 records, I could create them manually, but I thought let’s try and create en-mass using PowerShell. If you are on Windows Server 2012 and above, you have PowerShell cmdlets for DNS.

So I did the following –

To confirm they are created, the following helps –

Nice!

Create Solarwinds account limitations based on Custom Properties

I wanted to create account limitations in Solarwinds based on Custom Properties but the web console by default doesn’t give an option to do that.

default limitationsThen I came across this helpful video.

The trick is to logon to your Solarwinds server and find the “Account Limitation Builder”. Then click “Add” and create a new limitation similar to this –

new limitation

Now the limitation you create will come in the list –

new limitation2

Select that, and in the next screen you can choose the value you want to limit to –

new limitation3

Removing a monitored resource from multiple nodes in Solarwinds

Had to remove a drive from being monitored from multiple servers in Solarwinds. Rather than go to each node, edit its properties and untick the drive, I figured that if you do a search for the nodes and expand each of them it’s possible to tick the ones you don’t need and click “Delete”.

multiple nodes

Nice!

Using Solarwinds to monitor Windows Services

This is similar to how I monitored performance counters with Solarwinds.

I want to monitor a bunch of AppSense services.

Similar to the performance counters where I created an application monitor template so I could define the threshold values, here I need to create an application monitor so that the service appears in the alert manager. This part is easy (and similar to what I did for the performance counters) so I’ll skip and just put a screenshot like below –

applimonitor

I created a separate application monitor template but you could very well add it to one of the standard templates (if it’s a service that’s to be monitored on all nodes for instance).

Now for the part where you create alerts.

Initially I thought this would be a case of creating triggers when the above application monitor goes down. Something like this – alert1

And create an alert message like this –

alert1a

With this I was hoping to get a one or more alert messages only for the services that actually went down. Instead, what happened is that whenever any one service went down I’d get an alert for the service that went down and also a message for the services that were up. Am guessing since I was triggering on the application monitor, Solarwinds helpfully sent the status for each of its components – up or down.

alert2

The solution is to modify your trigger such that you target each component.

alert3

Now I get alerts the way I want.

Hope this helps!

Fixing a Windows Server that was stuck on “Preparing to configure Windows”

This is something that I fixed a few months ago at work but didn’t get a chance to blog about then. Coz of the gap I might not post much verbosely about it as I usually may.

The situation was that we had a Windows Server 2012 R2 that was stuck on a “Preparing to configure Windows” loop. I didn’t take a screenshot of it but you can find an example of it for Windows 7 here. All the usual troubleshooting steps like automatic repairs and last known good configuration etc didn’t make a dent. The problem began after the server was rebooted following Windows Updates so I focused on that. Here’s what I did to fix the server:

  1. I rebooted the server and pressed F8 before the Windows logo screen.
  2. This got me to the recovery options screen, where I selected the option to get a command prompt.
  3. Next I found the drive letter that corresponds to the C: drive of the server. This was through trial and error by typing each drive letter and find the one with the Windows folder.
    1. In my case the drive letter was E: so keep that in mind while viewing the screenshots replace. Replace with the drive letter you find.
  4. I entered the following command to get a list of all Windows updates, both installed and pending. 1
  5. The output was as below. 2
  6. I noted the names of the updates that were in a “Staged” state and uninstalled them one by one. 3
  7. Then I exited the command prompt. This rebooted the server and it was stuck at the following screen for a while. 4
  8. I stayed on this screen for about 15 mins. The server rebooted itself and stayed on the screen again, for about 10 mins. After this it proceeded to the login screen and I could login as usual. :)

Find out which DCs in your domain have the DHCP service enabled

Use PowerShell –

Result is a table of DC names and the status of the “DHCP Server” service. If the service isn’t installed (i.e. the feature isn’t enabled) you get a blank.

Quickly get the last boot up time of a remote Windows machine

PowerShell:

Command Prompt/ WMI:

Double quotes are important for the WMI method.

Mute Solarwinds alerts during reboots/ maintenance windows

I wanted to mute Solarwinds alerts during our patch weekends when all servers are rebooted because they have to be and our mailboxes get flooded with Solarwinds alerts. I decided to use custom properties for this purpose. Here’s what I did.

Login to the Solarwinds web console. Go to the “Settings” page, and then “Manage Custom Properties” under “Node *& Group Management”.

Click “Add Custom Property”, select the default of “Nodes” from the drop down, and create something along the following lines –

customproperties

Select the nodes you’d like to apply this custom property to. I chose to apply it on all my Windows and VMware nodes. Set the value to be “No”.

customvalues

Now login to Orion Alerts Manager and pick an alert you’d like to mute during patch weekends. Go to its “Alert Suppression” tab and add a condition on the custom property we created earlier.

alert custom properties

alert custom properties2

And that’s it, really!

Update: Not sure why, but the above didn’t seem to work for me. So I added the Mute_Alerts check as part of the trigger condition itself.

new trigger

Note: If you don’t get the custom property in Orion, close and restart it as an administrator (i.e. right click and do “Run as Administrator” even if you are already running it with an admin account). Not sure why, but until I did that the custom property didn’t get picked up. You only need to do it one time; after that you can launch Orion normally.

Next time your server estate is being rebooted/ undergoing maintenance, login to Solarwinds webconsole and change the “Mute_Alerts” custom property to “Y” for a node/ nodes that you want to mute alerts for. Below I show how I will mute the alerts for all my Windows nodes.

Go to “Manage Nodes”. Group by “Vendor” and select Windows. Then select all nodes. (The checkbox to select all nodes got blanked out in the screenshot below but it’s easy to find).

apply custom property

Then click on “Custom Property Editor” to get to the screen below.

apply custom property

Here too select all the nodes and click “Edit multiple values”.

From the drop down, change the value for “Mute_Alerts” to “true”. Then save changes and that’s it. :)

 

vCenter unable to connects to hosts; vSphere client gives error ‘”ServiceInstance.RetrieveContent” for object “ServiceInstance” on Server “IP-Address” failed’

Our Network team had been making some changes at work and suddenly vCenter in our London office lost connectivity with all the ESX hosts in one of our remote office. Moreover, when trying to connect from the vSphere Client to any of the remote hosts directly we were getting the following error –

client error

Connectivity from vSphere Client in the remote office to the ESX host in the same office was fine; it was only connectivity from other offices to this remote office. So it definitely indicated a network issue.

This KB article is a handy one to know what ports are required by various VMware products. Port 443 is what needs to be open to ESX hosts for vCenter Server to be able to talk to them. I did a telnet from the vCenter server to each of the remote office hosts on port 443 and it went through fine – so wasn’t a firewall issue. (Another post with port numbers, just FYI, is this one).

After a fair bit of troubleshooting we tracked the issue down to MTU.

Digressing into MTUs

Communication between two IP addresses (i.e. layer 3) happens through packets. Thus when my London vCenter Server communicates with my remote office ESX host, the two send TCP/IP packets to each other. When these packets from the vCenter Server reach the switch/ router on the same LAN as the ESX host, it becomes a layer 2 communication (because they are on the same network and it’s a matter of data reaching the ESX host from the switch/ router). In the case of Ethernet, this layer 2 communication happens via Ethernet frames. The frames encapsulate the IP packets – so the switch/ router breaks the packets and fits them into multiple frames, while the ESX host receives these frames and re-assembles the packets (and vice versa). (The picture on this Wikipedia page is worth a look to see the encapsulation). 

How much data can be held by a layer 2 frame is defined by the Maximum Transmission Unit (MTU). Larger MTUs are good because you can carry more data; but they have a downside in that each frame takes longer to be transmitted, and in case of any errors more data has to be re-transmitted when the frame is resent. So a balance is important. In the case of Ethernet, RFC 894 (see errata also) defines the MTU as a maximum of 1500 bytes. In the case of other layer 2 protocols, the MTU varies: for example 4464 bytes for Token Ring; 4352 bytes for FDDI; 9180 bytes for ATM; etc. In the case of Ethernet there are now also jumbo frames, which are frames with an MTU size of 9000 bytes (see this page for a table comparing regular frames and jumbo frames) and are commonly used in iSCSI networks.

Taking the case of Ethernet, assume the MTU of all Ethernet networks is 1500 bytes. So when two devices are conversing with each other over layer 3, and this conversation spans multiple Ethernet networks, it is helpful if the devices know that the MTU of the underlying layer 2 network is 1500 bytes. That way the two devices can keep the size of their layer 3 packets to be less than 1500 bytes. Why? Because if the size of the layer 3 packets are greater than 1500 bytes, then the devices and all the routers/ switches in between will have to fragment (break) the layer 3 packets into smaller packets of less than 1500 bytes to fit it in the Ethernet frame. This is a waste of resources for all, so it’s best if the two devices know of the underlying layer 2 MTU and act accordingly.

Now, note that Ethernet MTUs are defined as a maximum of 1500 bytes. So the MTU for a particular LAN segment can be set to a lower number for whatever reason (maybe there are additional fields in the Ethernet frame and to accommodate these the data portion must be reduced). Similarly, a layer 3 conversation between when two devices can go over a mix of layer 2 networks – Ethernet, Token Ring, etc – each with a different MTU. So what is required for the two devices really is a way of knowing what’s the lowest MTU across all these layer 2 devices, so the two devices can use it as the MTU of the layer 3 packets for their conversation. This is known as the Path MTU or IP MTU – and is basically the smallest MTU of all the underlying layer 2 MTUs over which that conversation traverses. It is discovered through a process known as “Path MTU Discovery” (PMTUD) (check this Wikipedia article, or Google this term to learn more). Very briefly, in the case of IPv4 what happens is that each device sends across packets of increasing size to the other end, with a flag set that says “do not fragment this packet”. Packets of size smaller than the lowest layer 2 MTU will get through, but once the size exceeds the lowest MTU the packet will fail & return because it cannot be fragmented (due to the flag) and so is returned via ICMP to the sender. Thus the Path MTU is discovered. This check happens in both directions.

So we have layer 2 MTUs and layer 3 MTUs. Layer 2 MTUs have a maximum value that is dependent on the layer 2 network technology. But what about the minimum value? RFC 791, which defines the Internet Protocol (the IP in TCP/IP), requires that all devices supporting IP must be able to forward packets of 68 bytes without fragmenting (68 bytes because IP headers take 60 bytes size and layer 2 headers take 8 bytes size minimum) and be able to accept packets of minimum size 576 bytes either as one packet or multiple packets that require assembling. Because of this the minimum layer 2 MTU can be thought of as 68 bytes. In a practical sense, however, most IP devices accept 576 bytes without fragmenting, and since this number is higher than the values for all layer 2 networks the minimum layer 2 & layer 3 MTU can be thought of as 576 bytes.

Just for completeness I will also mention Maximum Segment Size (MSS) which is a layer 4 MTU (of sorts) that defines what’s the maximum TCP segment (which is what a TCP packet is called) that can be accepted by devices. It has a default value of 536 bytes. This is based on the 576 bytes that IP requires hosts to accept at minimum, minus 20 bytes for IP headers and 20 bytes for TCP headers. Idea behind using 576 bytes as the base is that this way the TCP segment can be expected to arrive without fragmenting. In a practical sense again, for TCP/IP traffic over Ethernet (which is the common case), since Ethernet frames have an MTU of 1500, the MSS is usually set to 1500 minus 20 minus 20 = 1460 bytes.

This is a good article I came upon. Just linking it as a reference to myself.

Back to our issue

In our case the router in the remote site had the following set in its configuration:

I am not entirely clear where it was set or why it was set, as that comes under the Network team. What this does though is tell the router not to clear the “Do Not Fragment” (DF) bit in Ethernet frames. If a DF bit is present in a frame then the router will not fragment it if the frame size is larger than the MTU (this is how PMTUD also works). I am not sure why this was set – part of some testing I suppose – but because of this larger frames were not getting through to the other side and hence failing. Our Network team removed this statement and then communication with the ESX hosts started working fine.

I wanted to write more about this statement but I am running out of time. This and this are two good links worth reading for more info. Especially the Scenario 4 section in the second link – that’s pretty much what was happening in our case, I think.

WMI Access Denied for remote machine etc

This isn’t going to be a coherent post really (unlike my usual posts which are more coherent, I hope!). I came across a bunch of new stuff as I was troubleshooting this WMI issue and thought I should put them all somewhere.

The issue is that we are trying to get Solarwinds to monitor one of our DMZ servers via WMI but it keeps failing.

solarwinds

Other servers in the DMZ work, it’s just this one that fails. WMI ports aren’t fixed like I had mentioned earlier but I don’t think that matters coz they aren’t fixed for the other servers either. Firewall configuration is the same between both servers.

I thought of running Microsoft Network Monitor but realized that it’s been replaced with Microsoft Message Analyzer. That looks nice and seems to do a lot more than just network monitoring – I must explore it sometime. For now I ran it on the DMZ and applied a filter to see traffic from our Solarwinds server to it. The results showed that access was being denied, so may not a port issue after all.

analyzer message

Reading up more on this pointed me to a couple of suggestions. None of them helped but I’d like to mention them here for future reference.

First up is the command wbemtest. Run this from the local machine (the DMZ server in my case), click “Connect”, and then “Connect” again.

wbemtest1

If all is well with WMI you should get no error.

wbemtest2

Now try the same from the Solarwinds server, but this time try connecting to the DMZ server and enter credentials if any.

wbemtest3

That worked with the local administrator account but failed with the account I was using from Solarwinds to monitor the server. Error 0x80070005.

wbemtest4

So now I know the issue is one of permissions and not ports.

Another tool that can be used is WmiMgmt. Just type wmimgmt.msc somewhere to launch it. I tried this on the DMZ machine to confirm WMI works fine. I also tried it from SolarWinds machine to the DMZ machine. (Right click to connect to a remote computer).

wmimgmt

Problem with WmiMgmt is that unlike wbemtest you can’t a different account to use. If the two machines are domain joined or have the same local accounts, then it’s fine – you can run as from one machine and connect to the other – but otherwise there’s nothing much you can do. WmiMgmt is good to double check the permissions though. Click “Properties” in the screenshot above, go to the “Security” tab, select the “Root” namespace and click the “Security” button. The resulting window should show the permissions.

wmimgmt2

In my case the Administrators group members had full permissions as expected. The account I was using from the Solarwinds server was a member of this group too yet had access denied.

Another place to look at is Component Services > Computers > My Computer > DCOM Config > “Windows Management and Instrumentation” – right click and “Properties”.

componentservices

Make sure “Authentication Level” is “Default”. Then go to the “Security” tab and make sure the account/ group you want has permissions.

componentservices2

 

Also right click on “My Computer” and go to “Properties”.

componentservices3

Under the “COM Security” tab go to the “Edit Limits” of both access & launch and activation permissions and ensure the permissions are correct. My understanding is that the limits you specify here over-ride everything else.

componentservices4

In my case none of the above helped as they were all identical between both servers and at the correct setting. What finally helped was this Serverfault post.

Being DMZ servers these were on a workgroup and the Solarwinds server was connecting via a local account. Turns out that:

In a workgroup, the account connecting to the remote computer is a local user on that computer. Even if the account is in the Administrators group, UAC filtering means that a script runs as a standard user.

That is a good link to refer to. It is about Remote UAC and WMI. The solution is to go to the following registry key – HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\System – and create a new name called LocalAccountTokenFilterPolicy (of type DWORD_32) with a value of 1. (Check the Serverfault post for one more solution if you are using a Server 2012).

Remote UAC. Ah! Am surprised I didn’t think of that in the beginning itself. If the LocalAccountTokenFilterPolicy has a value of 0 (the default) then whenever a member of the Administrators group connects remotely, the security tokens of that account and filtered to remove admin access. This is only for local admin accounts, not domain admin accounts – mind you. If LocalAccountTokenFilterPolicy has a value of 1, then no filtering happens and the account connects with full admin rights. I have encountered Remote UAC in the past when accessing admin shares (e.g. \\computer\drive$) in my home workgroup, never thought it would be playing a role here with WMI!

Hope this helps someone. :)

Using Solarwinds to monitor Windows Performance Monitor (perfmon) Counters

Had a request from our Exchange admin to setup Solarwinds alerts for some of our Exchange servers based on Performance Monitor counters.

MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length       (above 200)
MSExchangeTransport Queues(_total)\Largest Delivery Queue Length                 (above 200)
MSExchangeTransport Queues(_total)\Messages Queued For Delivery                (above 200)
MSExchangeTransport Queues(_total)\Retry Remote Delivery Queue Length        (above 20)

Before setting up alerts I need to add them to Solarwinds first. Here’s how you do that.

First, open up the Solarwinds web console, go to Applications, and then SAM Settings.

applicationssam settings

Then go to Component Monitor Wizard.

component monitor

 

Select Windows Performance Counter Monitor.

perfmon

Notice that it says the data is collected using RPC. This means (1) the server must be monitored by Solarwinds using WMI and not SNMP. In case of the latter, switch to monitoring via WMI. And (2) RPC ports must be open between the Solarwinds server and the target server. If not, monitoring will fail.

Enter the name of a server you wish to target. This server would be one that contains the perfmon counters you are interested in. You use this server to setup monitoring for the counters you are interested in. Change to 64bit if 32bit doesn’t work.

target

Change the “Choose Credential” drop down according to your environment. To select the server it’s better to click “Browse” and find the server you are interested in if Solarwinds complains that it cannot find the name you type in.

Note: The next step will fail if you have not opened the required RPC ports.

Select the counters you are interested in. First select the object you want to monitor (MSExchangeTransport Queues, in the screenshot below) and then the counters.

select counters

The next screen will list all the counters you selected and give you a chance to set warning and critical thresholds. Customize these.

 

properties

Select where you would like these counters added to – a new application monitor/ monitor template, or an existing application monitor/ monitor template. I am going with a new application monitor template. Easier to make changes to templates than individual application monitors.

whereadd

 

Choose more nodes you would like to assign this application monitor to. Am skipping this screenshot. This step is optional as you can assign the application monitor to nodes later too.

An optional step – I also went to Manage Application Templates screen after the above steps, selected the template I created, and assigned it some tags and set a custom view.

defineview

A custom view lets you define what details are shown when anyone clicks this application monitor template on a particular node in the Solarwinds web console. You can customize the view by going to Settings (of Solarwinds) and selecting Manage Views.

Next step is to create an alert. For that you have to logon to the Solarwinds server itself, go to Alert Manager, create a new alert (skipping screenshots for all these) and create a new alert whose condition is as follows:solarwinds trigger

Note that the type of property to monitor is “APM: Component”. This is important for the correct variables to be visible in the alert message. Also, note that I am triggering for each of the component (with an “any” condition) and not for the application monitor itself. This lets me get alerts for individual components; if I don’t do this, and instead trigger on the application monitor itself, I will get alert emails for each component including the ones that don’t have an issue.

Here’s the alert message:

solarwinds message

PowerShell regexp match with lines above and below

Wasn’t aware of this until today when I needed to do this. The Select-String cmdlet in PowerShell can select strings based on a regexp (similar to findstr in regular command prompt) with the added benefit that it can also return context. Which is to say you can return the line that matches your pattern and also lines above or below it. Pretty cool!

In my case I needed to scan the output of portqry.exe and also get the UUIDs of the lines that match. These UUIDs are shown on the line above so I did something like this:

The -Context 1,0 switch is the key here. The first number tells the cmdlet how many lines before to show, the second number how many lines after.

Notes on WMI ports & monitoring

Trying to set up monitoring for some of our Windows DMZ servers via SolarWinds and came across a few interesting links. At the same time I noticed that my carefully organized bookmarks folders seem to be corrupt. Many folders are empty. This happened a few days ago too, but that time it was just one folder (well one folder that I knew of, could be more who knows) and so I was able to view and older copy of my bookmarks via Xmarks and add the missing entries back.

But this time it’s a whole bunch of folders and the only option Xmarks has it to either export the older copy or overwrite your current copy with this older set. I don’t want the latter as that would mean losing all my newer bookmarks. Wish there was some way of merging the current and older copies! Anyhow, what’s happened is happened, I think I’ll stick to using this blog for bookmarks. I keep referring to this blog over my bookmarks anyway, so this is a sign to stop with the unnecessary filing.

To start off, this is a must read on WMI ports and how to allow firewall exceptions for WMI. Gist of the matter is that WMI uses dynamic ports via the RPC Portmapper. When the Solarwinds server (for example) wants to talk to WMI on a target server, it contacts the RPC Portmapper service on the target server on port 135 (which is the standard port for the Portmapper service) and gets a dynamic port to use for WMI. This port can be anywhere between 1024 – 65535.

The fix for this is to give the Portmapper service a specific set of ports to use. One method is to use the registry (see the previous link or this KB article). Add a key called Internet under HKEY_LOCAL_MACHINE\Software\Microsoft\Rpc. To this add values  Ports (MULTI_SZ), PortsInternetAvailable (REG_SZ), and UseInternetPorts (REG_SZ). Set a value of Y for the latter two, and a range like 5000-5100 to the former. Restart the server after this.

Although I haven’t tried it, I think a similar effect as the above can be achieved via Component Services (type dcomcnfg.exe in a command prompt). Expand the “Computers” folder here, right click on “My Computer”, go to “Default Protocols”, click “Properties” of “Conenction-oriented TCP/IP”, and add a port range.

dcomcnfg

Another method is to use Group Policies.

Yet another method seems to be to get WMI to not use the RPC Portmapper for dynamic ports. By default WMI runs as a shared service, which is why it uses the RPC Portmapper. It is possible to make it run as a standalone service so it doesn’t use the Portmapper and instead defaults to port 24158. (This port number too can be changed via dcomcnfg.exe but I am not sure how).


These two links didn’t make much sense to me, but I know they are of use so linking them here as a reference to myself for later:

 

What does the vCloud Air “Enable Service Network” do?

I don’t know. :)

But I think it’s used when you want to connect a vCloud Air network (which is part of a Disaster Recovery or Virtual Private Cloud setup) with the network of a vCloud Air PaaS such as vCloud Air SQL. I base this on this vCloud Air SQL Users Guide that talks as about enabling the service network to connect a vDC (vCloud Air Virtual Datacenter) to the vCloud Air SQL network.

Will add more to this post if & when I get to know more.

Troubleshooting ESXi host reboots

Had to troubleshoot an ESXi host reboot today. Came across this link – good one.

Here’s what I did though after the host reboot.

Once the host was online I connected to it via the vSphere client. I didn’t connect to the host directly (though you can do that too). I connected to the vCenter, then navigated to that host, went to the File menu and exported the system logs.

exportsyslogs

This creates a zip file containing another archive. I extracted the contents of this into a folder. The root of that folder has the usual Linux filesystem structure.

dirstructure

I went into the var folder here. (The log subfolder has many logs but most of these might be from after the reboot. If that’s the case, check the run/log subfolder).

In my case the /var/log/vmksummary.log file had entries for when the host rebooted. None of the other files mentioned anything.

Then I went to the /var/run/log folder via PowerShell and ran a grep for the word reboot –

Lots of messages indicating that the host was rebooted via the DCUI (lines 2, 4, 5, and 12). Thus I realized someone had manually rebooted the host.