Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Recent Posts

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

Enabling SNMPv3 on ESXi hosts

A continuation to my earlier post which was to do with SNMPv2.

As before, connect to the vCenter via PowerCLI. And as before the set() method can be used to set SNMP – both v2 and/or v3. The definition of this method is as follows:

That’s confusing so best to copy paste the definition into notepad or something so you can be sure you are passing the correct arguments.

First things first. There doesn’t seem to be a way of turning off something. As in, say you already have SNMPv2 turned on, you can’t turn it off by setting the community strings to blank. Doing so generates an error. So if you want to turn previous things off it’s best to do a reset and start with a clean slate.

This sets things back to their defaults:

Before going ahead with any SNMPv3 configuration we need to decide on what authentication and privacy protocols to use. In my case I want to use SHA1 and AES-128. So I need to set that first:

Once I have done this I can generate the hashes. I will need this later to configure SNMPv3.

In the example above both my passwords are Password1.

With this in hand I configure SNMPv3:

That’s it really. In the above example I will be using an SNMPv3 user called snmpUser1.

Now to do it across my estate I can make a loop. No need to create password hashes for each host. The hash stays the same as long as you are using the same password for each host.

That’s all!

Solarwinds not seeing correct disk size; “Connection timeout. Job canceled by scheduler.” errors

Had this issue at work today. Notice the disk usage data below in Solarwinds –

Disk Usage

The ‘Logical Volumes’ section shows the correct info but the ‘Disk Volumes’ section shows 0 for everything.

Added to that all the Application Monitors had errors –

Timeout

I searched Google on the error message “Connection timeout. Job canceled by Scheduler.” and found this Solarwinds KB article. Corrupt performance counters seemed to be a suspect. That KB article was a bit confusing me to in that it gives three resolutions and I wasn’t sure if I am to do all three or just pick and choose. :)

Event Logs on the target server did show corrupt performance counters.

Initial Errors

I tried to get the counters via PowerShell to double check and got an error as expected –

Broken Get-Counter

Ok, so performance counter issue indeed. Since the Solarwinds KB article didn’t make much sense to me I searched for the Event ID 3001 as in the screenshot and came across a TechNet article. Solution seemed simple – open up command prompt as an admin, run the command lodctr /R. This command apparently rebuilds the performance counters from scratch based on currently registry settings adn backup INI files (that’s what the help message says). The command completed straight-forwardly too.

lodctr - 1

With this the performance counters started working via PowerShell.

Working Get-Counter

Event Logs still had some error but those were to do with the performance counters of ASP.Net and Oracle etc.

More Errors

The fix for this seemed to be a bit more involved and requires rebooting the server. I decided to skip it for now as I don’t these additional counters have much to do with Solarwinds. So I let those messages be and tried to see if Solarwinds was picking up the correct info. Initially I took a more patient approach of waiting and trying to make it poll again; then I got impatient and did things like removing the node from monitoring and adding it back (and then wait again for Solarwinds to poll it etc) but eventually it began working. Solarwinds now sees the disk space correctly and all the Application Monitors work without any errors too.

Here’s what I am guessing happened (based on that Solarwinds KB article I linked to above). The performance counters of the server got corrupt. Solarwinds uses counters to get the disk info etc. Due to this corruption the poller spent more time than usual when fetching info from the server. This resulted in the Application Monitor components not getting a chance to run as the poller had run out of time to poll the server. Thus the Application Monitors gave the timeout errors above. In reality the timeout was not from those components, it was from the corrupt performance counters.

Enabling SNMP on ESXi hosts

I wanted to enable SNMP on our ESXi hosts for monitoring via Solarwinds. Here’s what I did. (I am doing this kind of generically, using variables etc, so I can script the thing for multiple hosts).

First I connected to the vCenter Server from PowerCLI.

Next I got its ESXCLI object. This will let me run ESXCLI commands against the host.

To view the current status of SNMP you can do can invoke a get() method –

Nothing’s configured currently. To configure something we can use the set() method. From the definition of this method we can see it takes a whole bunch of parameters –

Here’s what I did to configure SNMP. I want a community string of “public”, enable SNMP, and specify two trap destinations.

The result of that will either be a true or false. The get() method can be used again to confirm it is set correctly. And the test() method can be used to test it works –

Now Solarwinds will be able to poll the host via SNMP.

To do this en-masse on all your hosts the following should help –

Shout out to this VMware blog post which helped a lot and has more info.

The above script failed on some of our ESX hosts with the following error –

Turns out these hosts only accept 16 parameters instead of 17 (the one called largestorage is missing). Not sure why. All our hosts are ESXi 5.5 but am thinking the problem ones are perhaps not using the HP customized version of ESXi.

Anyways, so I modified my script above to take care of this –

Also, just for my own info – the $null above means the parameter is not set. If that parameter already has a value on the server it is not over-written. To over-write or blank out the existing value replace $null with "".

Using SolarWinds to highlight servers in a pending reboot status

Had a request to use SolarWinds to highlight servers in a pending reboot status. Here’s what I did.

Sorry, this is currently broken. After implementing this I realized I need to enable PowerShell remoting on all servers for it to work, else the script just returns the result from the SolarWinds server. Will update this post after I fix it at my workplace. If you come across this post before that, all you need to do is enable PowerShell remoting across all your servers and change the script execution to “Remote Host”.

SolarWinds has a built in application monitor called “Windows Update Monitoring”. It does a lot more than what I want so I disabled all the components I am not interested in. (I could have also just created a new application monitor, I know, just was lazy).

winupdatemon-1

The part I am interested in is the PowerShell Monitor component. By default it checks for the reboot required status by checking a registry key: HKLM:\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate\Auto Update\RebootRequired. Here’s the default script –

Inspired by this blog post which monitors three more registry keys and also queries ConfigMgr, I replaced the default PowerShell script with the following –

Then I added the application monitor to all my Windows servers. The result is that I can see the following information on every node –

winupdatemon-2

Following this I created alerts to send me an email whenever the status of the above component (“Machine restart status …”) went down for any node. And I also created a SolarWinds report to capture all nodes for which the above component was down.

winupdatemon-3

Then I assigned this to a schedule to run once in a month after our patching window to email me a list of nodes that require reboots.

 

Solarwinds AppInsight for IIS – doing a manual install – and hopefully fixing invalid signature (error code: 16007)

AppInsight from Solarwinds is pretty cool. At least the one for Exchange is. Trying out the one for IIS now. Got it configured on a few of our servers easily but it failed on one. Got the following error –

appinsight-error

Bummer!

Manual install it is then. (Or maybe not! Read on and you’ll see a hopeful fix that worked for me).

First step in that is to install PowerShell (easy) and the IIS PowerShell snap-in. The latter can be downloaded from here. This downloads the Web Platform Installer (a.k.a. “webpi” for short) and that connects to the Internet to download the goods. In theory it should be easy, in practice the server doesn’t have connectivity to the Internet except via a proxy so I have to feed it that information first. Go to C:\Program Files\Microsoft\Web Platform Installer for that, find a file called WebPlatformInstaller.exe.config, open it in Notepad or similar, and add the following lines to it –

This should be within the <configuration> -- </configuration> block. Didn’t help though, same error.

webpi-error

Time to look at the logs. Go to %localappdata%\Microsoft\Web Platform Installer\logs\webpi for those.

From the logs it looked like the connection was going through –

But the problem was this –

If I go to the link – https://www.microsoft.com/web/webpi/5.0/webproductlist.xml – via IE on that server I get the following –

untrusted-cert

 

However, when I visit the same link on a different server there’s no error.

Interesting. I viewed the untrusted certificate from IE on the problem server and compared it with the certificate from the non-problem server.

Certificate on the problem server

Certificate on the problem server

Certificate on a non-problem server

Certificate on a non-problem server

Comparing the two I can see that the non-problem server has a VeriSign certificate in the root of the path, because of which there’s a chain of trust.

verisign - g5

If I open Certificate Manager on both servers (open mmc > Add/ Remove Snap-Ins > Certificates > Add > Computer account) and navigate to the “Trusted Root Certification Authorities” store) on both servers I can see that the problem server doesn’t have the VeriSign certificate in its store while the other server has.

cert manager - g5

So here’s what I did. :) I exported the certificate from the server that had it and imported it into the “Trusted Root Certification Authorities” store of the problem server. Then I closed and opened IE and went to the link again, and bingo! the website opens without any issues. Then I tried the Web Platform Installer again and this time it loads. Bam!

The problem though is that it can’t find the IIS PowerShell snap-in. Grr!

no snap-in

no snap-in 2

That sucks!

However, at this point I had an idea. The SolarWinds error message was about an invalid signature, and what do we know of that can cause an invalid signature? Certificate issues! So now that I have installed the required CA certificate for the Web Platform Installer, maybe it sorts out SolarWinds too? So I went back and clicked “Configure Server” again and bingo! it worked this time. :)

Hope this helps someone.

Solarwinds – “The WinRM client cannot process the request”

Added the Exchange 2010 Database Availability Group application monitor to couple of our Exchange 2010 servers and got the following error –

error1

Clicking “More” gives the following –

error2

This is because Solarwinds is trying to run a PowerShell script on the remote server and the script is unable to run due to authentication errors. That’s because Solarwinds is trying to connect to the server using its IP address, and so instead of using Kerberos authentication it resorts to Negotiate authentication (which is disabled). The error message too says the same but you can verify it for yourself from the Solarwinds server too. Try the following command

This is what’s happening behind the scenes and as you will see it fails. Now replace “Negotiate” with “Kerberos” and it succeeds –

So, how to fix this? Logon to the remote server and launch IIS Manager. It’s under “Administrative Tools” and may not be there by default (my server only had “Internet Information Services (IIS) 6.0 Manager”), in which case add it via Server Manager/ PowerShell –

Then open IIS Manager, go to Sites > PowerShell and double click “Authentication”.

iis-1

Select “Windows Authentication” and click “Enable”.

iis-2

Now Solarwinds will work.

Create Solarwinds account limitations based on Custom Properties

I wanted to create account limitations in Solarwinds based on Custom Properties but the web console by default doesn’t give an option to do that.

default limitationsThen I came across this helpful video.

The trick is to logon to your Solarwinds server and find the “Account Limitation Builder”. Then click “Add” and create a new limitation similar to this –

new limitation

Now the limitation you create will come in the list –

new limitation2

Select that, and in the next screen you can choose the value you want to limit to –

new limitation3

Removing a monitored resource from multiple nodes in Solarwinds

Had to remove a drive from being monitored from multiple servers in Solarwinds. Rather than go to each node, edit its properties and untick the drive, I figured that if you do a search for the nodes and expand each of them it’s possible to tick the ones you don’t need and click “Delete”.

multiple nodes

Nice!

Using Solarwinds to monitor Windows Services

This is similar to how I monitored performance counters with Solarwinds.

I want to monitor a bunch of AppSense services.

Similar to the performance counters where I created an application monitor template so I could define the threshold values, here I need to create an application monitor so that the service appears in the alert manager. This part is easy (and similar to what I did for the performance counters) so I’ll skip and just put a screenshot like below –

applimonitor

I created a separate application monitor template but you could very well add it to one of the standard templates (if it’s a service that’s to be monitored on all nodes for instance).

Now for the part where you create alerts.

Initially I thought this would be a case of creating triggers when the above application monitor goes down. Something like this – alert1

And create an alert message like this –

alert1a

With this I was hoping to get a one or more alert messages only for the services that actually went down. Instead, what happened is that whenever any one service went down I’d get an alert for the service that went down and also a message for the services that were up. Am guessing since I was triggering on the application monitor, Solarwinds helpfully sent the status for each of its components – up or down.

alert2

The solution is to modify your trigger such that you target each component.

alert3

Now I get alerts the way I want.

Hope this helps!

Mute Solarwinds alerts during reboots/ maintenance windows

I wanted to mute Solarwinds alerts during our patch weekends when all servers are rebooted because they have to be and our mailboxes get flooded with Solarwinds alerts. I decided to use custom properties for this purpose. Here’s what I did.

Login to the Solarwinds web console. Go to the “Settings” page, and then “Manage Custom Properties” under “Node *& Group Management”.

Click “Add Custom Property”, select the default of “Nodes” from the drop down, and create something along the following lines –

customproperties

Select the nodes you’d like to apply this custom property to. I chose to apply it on all my Windows and VMware nodes. Set the value to be “No”.

customvalues

Now login to Orion Alerts Manager and pick an alert you’d like to mute during patch weekends. Go to its “Alert Suppression” tab and add a condition on the custom property we created earlier.

alert custom properties

alert custom properties2

And that’s it, really!

Update: Not sure why, but the above didn’t seem to work for me. So I added the Mute_Alerts check as part of the trigger condition itself.

new trigger

Note: If you don’t get the custom property in Orion, close and restart it as an administrator (i.e. right click and do “Run as Administrator” even if you are already running it with an admin account). Not sure why, but until I did that the custom property didn’t get picked up. You only need to do it one time; after that you can launch Orion normally.

Next time your server estate is being rebooted/ undergoing maintenance, login to Solarwinds webconsole and change the “Mute_Alerts” custom property to “Y” for a node/ nodes that you want to mute alerts for. Below I show how I will mute the alerts for all my Windows nodes.

Go to “Manage Nodes”. Group by “Vendor” and select Windows. Then select all nodes. (The checkbox to select all nodes got blanked out in the screenshot below but it’s easy to find).

apply custom property

Then click on “Custom Property Editor” to get to the screen below.

apply custom property

Here too select all the nodes and click “Edit multiple values”.

From the drop down, change the value for “Mute_Alerts” to “true”. Then save changes and that’s it. :)

 

Using Solarwinds to monitor Windows Performance Monitor (perfmon) Counters

Had a request from our Exchange admin to setup Solarwinds alerts for some of our Exchange servers based on Performance Monitor counters.

MSExchangeTransport Queues(_total)\Active Remote Delivery Queue Length       (above 200)
MSExchangeTransport Queues(_total)\Largest Delivery Queue Length                 (above 200)
MSExchangeTransport Queues(_total)\Messages Queued For Delivery                (above 200)
MSExchangeTransport Queues(_total)\Retry Remote Delivery Queue Length        (above 20)

Before setting up alerts I need to add them to Solarwinds first. Here’s how you do that.

First, open up the Solarwinds web console, go to Applications, and then SAM Settings.

applicationssam settings

Then go to Component Monitor Wizard.

component monitor

 

Select Windows Performance Counter Monitor.

perfmon

Notice that it says the data is collected using RPC. This means (1) the server must be monitored by Solarwinds using WMI and not SNMP. In case of the latter, switch to monitoring via WMI. And (2) RPC ports must be open between the Solarwinds server and the target server. If not, monitoring will fail.

Enter the name of a server you wish to target. This server would be one that contains the perfmon counters you are interested in. You use this server to setup monitoring for the counters you are interested in. Change to 64bit if 32bit doesn’t work.

target

Change the “Choose Credential” drop down according to your environment. To select the server it’s better to click “Browse” and find the server you are interested in if Solarwinds complains that it cannot find the name you type in.

Note: The next step will fail if you have not opened the required RPC ports.

Select the counters you are interested in. First select the object you want to monitor (MSExchangeTransport Queues, in the screenshot below) and then the counters.

select counters

The next screen will list all the counters you selected and give you a chance to set warning and critical thresholds. Customize these.

 

properties

Select where you would like these counters added to – a new application monitor/ monitor template, or an existing application monitor/ monitor template. I am going with a new application monitor template. Easier to make changes to templates than individual application monitors.

whereadd

 

Choose more nodes you would like to assign this application monitor to. Am skipping this screenshot. This step is optional as you can assign the application monitor to nodes later too.

An optional step – I also went to Manage Application Templates screen after the above steps, selected the template I created, and assigned it some tags and set a custom view.

defineview

A custom view lets you define what details are shown when anyone clicks this application monitor template on a particular node in the Solarwinds web console. You can customize the view by going to Settings (of Solarwinds) and selecting Manage Views.

Next step is to create an alert. For that you have to logon to the Solarwinds server itself, go to Alert Manager, create a new alert (skipping screenshots for all these) and create a new alert whose condition is as follows:solarwinds trigger

Note that the type of property to monitor is “APM: Component”. This is important for the correct variables to be visible in the alert message. Also, note that I am triggering for each of the component (with an “any” condition) and not for the application monitor itself. This lets me get alerts for individual components; if I don’t do this, and instead trigger on the application monitor itself, I will get alert emails for each component including the ones that don’t have an issue.

Here’s the alert message:

solarwinds message