Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

Certificates in the time of Let’s Encrypt

Here’s me generating two certs – one for “edge.raxnet.global” (with a SAN of “mx.raxnet.global”), another for “adfs.raxnet.global”. Both are “public” certificates, using Let’s Encrypt. 

Easy peasy!

And here are the files themselves:

The password for all these is ‘poshacme’.

I don’t know if I will ever use this at work but I was reading up on Let’s Encrypt and ACME Certificate Authorities and decided to play with it for my home lab. A bit of work went behind the scenes into the above stuff so here’s some notes.

First off, ACME Certificates are all about automation. You get certificates that are valid for only 90 days, and the idea is that every 60 days you renew them automatically. So the typical approach of a CA verifying your identity via email doesn’t work. All verification is via automated methods like a DNS record or HTTP query, and you need some tool to do all this for you. There’s no website where you go and submit a CSR or get a certificate bundle to download. Yeah, shocker! In the ACME world everything is via clients, a list of which you can find here. Most of these are for Linux or *nix, but there are a few Windows ones too (and if you use a web server like Caddy you even get HTTPS out of the box with Let’s Encrypt). 

To dip my feet I started with Certify a Web, a GUI client for all this. It was fine, but didn’t hook me on much and so I moved to Posh-ACME a purely PowerShell CLI tool. I’ve liked this so far. 

Apart from installing the tool via something like Install-Module -Name Posh-ACME there’s some background work that needs to be done. A good starting point is the official Quick Start followed by the Tutorial. What I go into below is just a rehash for myself of what’s already in the official docs. :)

Requesting a certificate via Posh-ACME is straightforward. Essentially, if I want a certificate for a domain ’sso.raxnet.global’ I would do something like this: 

At this point I’d have to go and make the specified TXT record. The command will wait 2 mins and then check for the existence of the TXT record. Once it finds it the certificates are generated and all is good. (If it doesn’t find the record it keeps waiting; or I can Ctrl+C to cancel it). My ACME account will be admin@rakhesh.com and the certificate tied to that. 

If I want to be super lazy, this is all I need to do! :) Run this command every 60 days or so (worst case every 89 days), update the _acme-challenge.domain TXT record as requested (the random value changes each time), and bam! I am done. 

If I want to automate it however I need to do some more stuff. Specifically, 1) you need to be on a DNS provider that gives you an API to update its records, and 2) hopefully said DNS provider is on the Posh-ACME supported list. If so, all is good. I use Azure DNS for my domain, and instructions for using Azure DNS are already in their documentation. If I were on a DNS provider that didn’t have APIs, or for whatever reason if I wanted to use a different DNS provider to my main domain, I can even make use of CNAMEs. I like this CNAME idea, so even though I could have used my primary zone hosted in Azure DNS I decided to make another zone in Azure DNS and down the CNAME route. 

So, here’s how the CNAME thing works. Notice above Posh-ACME asked me to create a record called _acme-challenge.sso.raxnet.global? Basically for every domain you are requesting a certificate for (including a name in the Subject Alternative Name (SAN)), you need to create a _acme-challenge.<domain> TXT record with the random challenge given by ACME. However, what you can also do is say have a separate domain like for example ‘acme.myotherdomain’ and I can pre-create CNAME records like _acme-challenge.<whatever>.mymaindomain -> <whatever>.myotherdomain such that when the validation process looks for _acme-challenge.<whatever>.mydomain it will follow it through to <whatever>.myotherdomain and update & verify the record there. So my main domain never gets touched by any automatic process; only this other domain that I setup (which can even be a sub-domain of my main domain) is where all the automatic action happens. 

In my case I created a CNAME from sso.raxnet.global to sso.acme.raxnet.global (where acme.raxnet.global is my Azure DNS hosted zone). I have to create the CNAME record before hand but I don’t need to make any TXT records in the acme.raxnet.global zone – that happens automatically. 

To automate things I then made a service account (aka “App registrations” in Azure-speak) whose credentials I could pass on to Posh-ACME, and whose rights were restricted. The Posh-ACME documentation has steps on creating a custom role in Azure to just update TXT records; I was a bit lazy here and simply made a new App Registration via the Azure portal and delegated it “DNS Contributor” rights to the zone. 

NewImage

NewImage

Not shown in the screenshot, after creating the App Registration I went to its settings and also assigned a password. 

NewImage

That done, the next step is to collect various details such as the subscription ID & tenant ID & and the App Registration name and password into a variable. Something like this:

This is a one time thing as the credentials and details you enter here are then stored in the local profile. This means renewals and any new certificate requests don’t require the credentials etc. to be passed along as long as they use the same DNS provider plugin. 

That’s it really. This is the big thing you really have to do to make the DNS part automated. Assuming I have already filling in the $azParams hash-table as above (by copy-pasting it into a PowerShell window after filling in the details and then entering the App Registration name and password when prompted) I can request a new certificate thus:

Key points:

  • The -DnsPlugin switch specifies that I want to use the Azure plugin
  • The -PluginArgs switch passes along the arguments this plugin expects; in this case the credentials etc. I filled into the $azParams hash-table
  • The -DnsAlias switch specifies the CNAME records to update; you specify one for each domain. For example, in this case ‘sso.raxnet.global’ will be aliased to ‘sso.acme.raxnet.global’ so the latter is what the DNS plugin will go and update. If I specified two domains e.g. ‘sso.raxnet.global’,’sso2.raxnet.global’ (an array of domains) then I would have had to specify two aliases ‘sso.acme.raxnet.global’,’sso2.acme.raxnet.global’ OR I could just specify one alias ‘sso.acme.raxnet.global’ provided I have created CNAMES from both domains to this same entry, and the plugin will use this alias for both domains. My first example at the beginning of this post does exactly that. 

That’s it! To renew my certs I have to use the Submit-Renewal cmdlet. I don’t even need to run it manually. All I need do is create a scheduled task to run the cmdlet Submit-Renewal -AllAccounts to renew all my certificates tied to the current profile (so if I have certificates under two different accounts – e.g. admin@rakhesh.com and admin2@rakhesh.com but both are in the same Windows account where I am running this cmdlet from, both accounts would have their certs renewed). 

NewImage

What I want to try next is how to get these certs updated with Exchange and ADFS. Need to figure out if I can automatically copy these downloaded certs to my Exchange and ADFS servers. 

Outlook auto-discover & DNS weirdness

It’s 2am and I spent the last 2-3 hours chasing a shitty problem in my home lab to which I haven’t yet found a satisfactory answer. What a waste of time (sort of)!

It all began when I enabled MAPI/HTTP on my home Exchange 2013 and noticed that Outlook (2016 & 2019) was still connecting to it via RPC/HTTP. To troubleshoot that I deleted my Outlook profile to see if a fresh auto-discovery would get it to connect using MAPI/HTTP. That ended up being a long rabbit hole!

Background: all my user accounts are of the form firstname.lastname@raxnet.global. I have autodiscovery setup via SCP (here’s a good blog post to read if you want an overview of auto-discovery) and in the past when I launch Outlook it used to give me a prompt like this – 

NewImage

– followed by this one, where I choose “Exchange” and am done. 

NewImage

Today, however, I was getting prompted for a password after the first screen and when I entered the password it complained that the password was wrong. Moreover, I noticed that it was trying to connect via IMAP and picking up an external IMAP server for this domain (the domain raxnet.global is setup externally too with IMAP auto discovery in the form of SRV records for _imaps._tcp.raxnet.global). That didn’t make sense. All my internal clients were pointing to the internal DC for DNS, and this DC had the raxnet.global zone with itself so there was no reason why queries should be going to the Internet.

NewImage

 

NewImage

(Notice the imap.fastmail.com? That should not be happening.)

Moreover, if I did an nslookup -type=SRV _imaps._tcp.raxnet.global. from my clients they weren’t resolving the name either. How was Outlook able to resolve this then?! (I still don’t have an answer to this so don’t get your hopes up, dear reader). Clients can query the name only if they query an external DNS server but that wasn’t happening in my case. 

NewImage

Here’s an article on Outlook 2016’s implementation of auto discover. It too tells me that SCP is the default mechanism for auto-discovery when Outlook discovers it’s on a domain joined machine. Starting from Outlook 2016 however, there is a special case in that if Outlook feels the account is an Office 365 enabled one it will try and auto-discover using Office 365 endpoints (typically https://autodiscover-s.outlook.com/autodiscover/autodiscover.xml or https://autodiscover-s.partner.outlook.cn/autodiscover/autodiscover.xml). My domain isn’t on Office 365, but what the heck I disabled it via a registry key but no luck it still goes to IMAP. 

Why I don’t understand is why it keeps ignoring SCP. It is as if Outlook thinks I am not domain joined and is magically able to query external DNS. The Windows VM on which Outlook runs can’t query the outside world so how is Outlook doing it?

I verified the SCP end points in Exchange (this is a good article to read) and also came across this script which you can run on the client side to make it query AD for the SCP endpoint via LDAP. (I couldn’t access the blog so had to get a cached copy via Google. So I am going to copy paste the script here for my future reference. All credit to the author of the blog). 

This correctly returned my client machines’ site as well as the autodiscover URL. 

I have no idea why auto-discover was failing. My hunch at this point was that Outlook was querying DNS some other way, because of which it was thinking it is not domain joined and so all the usual auto-discovery was failing until it hit up on the Internet face IMAP auto-discovery. So what was my DNS setup like? 

All my clients were pointing to the DC via DHCP for DNS. The DNS in turn was forwarding to my OPNSense router which had Unbound DNS running on it. I verified again, from the client and DC, that I am unable to resolve the external records of raxnet.global. I suspected OPNSense at this point so I turned off Unbound DNS to see if that impacts the auto-discovery. Sure enough, it did! Now Outlook was behaving as it used to. 

Unbound DNS was set to listen only on the servers subnet. So there was no way a query could come from my clients subnet to Unbound (my bad, I should have checked this by querying Unbound directly – too sleepy now to enable Unbound again and see what happens). I couldn’t see any other settings either so I stopped Unbound and added root hints to the DC so it can talk to the Internet directly. (Here yet another thing wasted my time. I always set DNS servers to listen on “All IP addresses” – including IPv6. With that setting my DC was unable to resolve external names using the root hints. If I disable it to listen only on IPv4, the DC is able to resolve as expected! Something in the IPv6 was causing an issue. Funnily a colleague had mentioned this in the past but I never believed him). 

NewImage

That done, my clients would now talk to the DC for DNS queries (as before), and the DC would go and lookup on its own (unlike before). No more forwarder. I tried Outlook again with this setting and auto-discover correctly works. 

The issue isn’t Unbound or OPNSense. The issue is that when my DC is using a forwarder (any forwarder, even 8.8.8.8 for instance) Outlook thinks it is not domain joined and it can view my external DNS records – breaking auto-discovery. I don’t know why this happens, especially considering the OS on which Outlook is running can’t resolve the external records and can in fact resolve the internal LDAP end-points correctly. Something in the way Outlook is sending DNS queries is causing it to work differently. (Or maybe I am just too sleepy and can’t think of a simple explanation now!)

Setting up IPsec tunnel from OPNsense at home to Azure

This is mainly based on this and this blog posts with additional inputs from my router FAQ for my router specific stuff. 

I have a virtual network in Azure with a virtual network gateway. I want a Site to Site VPN from my home to Azure so that my home VMs can talk to my Azure VMs. I don’t want to do Point to Site as I have previously done as I want all my VMs to be able to talk to Azure instead of setting up a P2S from each of them. 

My home setup consists of VMware Fusion running a bunch of VMs, one of which is OPNSense. This is my home gateway – it provides routing between my various VM subnets (I have a few) and acts as DNS resolver for the internal VMs etc. OPNSense has one interface that is bridged to my MacBook so it is not NAT’d behind the MacBook, it has an IP on the same network as the MacBook. I decided to do this as to is easier to port forward from my router to OPNSense. 

OPNSense has an internal address of 192.168.1.23. On my router I port forward UDP ports 500 & 4500 to this. I also have IPSec Passthrough enabled on the router (that’s not mentioned in the previous link but I came across it elsewhere). 

My home VMs are in the 10.0.0.0/8 address space (in which there are various subnets that OPNSense is aware of). My Azure address space is 172.16.0.0/16.

First I created a virtual network gateway in Azure. It’s of the “VpnGw1” SKU. I enabled BGP and set the ASN to 65515 (which is the default). This gateway is in the gateway subnet and has an IP of 172.16.254.254. (I am not using BGP actually, but I set this so I can start using it in future. One of the articles I link to has more instructions). 

Next I created a local network gateway with the public IP of my home router and an address space of 10.0.0.0/8 (that of my VMs). Here too I enabled BGP settings and assigned an ASN of 65501 and set the peer address to be the internal address of my OPNSense router – 192.168.1.23. 

Next I went to the virtual network gateway section and in the connections section I created a new site to site (IPsec) connection. Here I have to select the local network gateway I created above, and also create a pre shared key (make up a random passphrase – you need this later). 

That’s all on the Azure end. Then I went to OPNSense and under VPN > IPsec > Tunnel Settings I created a new phase 1 entry. 

NewImage

I think most of it is default. I changed the Key Exchange to “auto” from v2. For “Remote gateway” I filled in my Azure virtual network gateway public IP. Not shown in this screenshot is the pre shared key that I put in Azure earlier. I filled the rest of it thus – 

NewImage

Of particular note is the algorithms. From the OPNSense logs I noticed that these are the combinations Azure supports – 

IKE:AES_CBC_256/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024, IKE:AES_CBC_256/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024, IKE:AES_CBC_128/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024, IKE:AES_CBC_128/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024, IKE:3DES_CBC/HMAC_SHA1_96/PRF_HMAC_SHA1/MODP_1024, IKE:3DES_CBC/HMAC_SHA2_256_128/PRF_HMAC_SHA2_256/MODP_1024

I didn’t know this and so initially I had selected a DH key group size of 2048 and my connections were failing. From the logs I came across the above and changed it to 1024 (the 2048 is still present but it won’t get used as 1024 will be negotiated; I should remove 2048 really, forgot to do it before taking this screenshot and it doesn’t matter anyways). Then highlighted entry is the combination I chose to go with. 

After this I created a phase 2 entry. This is where I define my local and remote subnets as seen below:

NewImage

I left everything else at their defaults. 

NewImage

And that’s it. After that things connected and I could see a status of connected on the Azure side as well as on my OPNSense side under VPN > IPsec > Status Overview (expand the “I” in the Status column for more info). Logs can be seen under VPN > IPsec > Log File in case things don’t work out as expected. 

I don’t have any VMs in Azure but as a quick test I was able to ping my Azure gateway on its internal IP address (172.16.254.254) from my local VMs. 

Of course a gotcha with this configuration is that when my home public IP changes (as it is a dynamic public IP) this will break. It’s not a big deal for me as I can login to Azure and enter the new public IP in the local network gateway, but I did find this blog post giving a way of automating this

New ADFS configuration wizard does not pick up SSL certificate

Was setting up ADFS in my  home lab and I encountered the following issue. Even though I had a certificate generated and imported to the personal certificate store of the ADFS server, it was not being picked up by the configuration wizard. 

EmptyCert

I tried exporting the certificate with its private key as a PFX file and clicking the Import button above. Didn’t help either.

I also tried the following which didn’t help (but since I took some screenshots and I wasn’t aware of this way of tying certificates to a service account, I thought I’d include it here anyways). 

Launch mmc and add the Certificates snap-in. Choose Service account

Certificates

Then Local Computer. And Active Directory Federation Services

ADFS-Cert

Import the PFX certificate to its personal store. 

This too didn’t help! 

Finally, what did help was create a new certificate but use the CN and SAN name different to the server name. As in, my original certificate had a CN of “myservername.fqdn” along with some SANs of “myservername.fqdn” and “adfs.fqdfn” (the latter being what my ADFS federation service name would have been) but for the new cert I generated I went with a CN of “adfs.fqdn” and SANs of “adfs.fqdn” and “myservername.fqdn”. That worked!

[Aside] Offline CRL errors when requesting a certificate

This blog post saved my bacon many times in my home lab. 

Remember this command: 

MacOS VPN doesn’t use the VPN DNS

Continuing with my previous post … as part of configuring it I went to “Advanced” > “DNS” in the VPN connection and put in my remote end DNS server and domain name to search. On Windows 10 I didn’t even have to do this – remote DNS and domains were automatically configured as part of connecting. Anyways, once I put these in though I thought it should just work out of the box but it didn’t.

So turns out many others have noticed and complained about this. I couldn’t find a solution as such to this but learnt about scutil --dns in the process. Even though the Mac OS has a /etc/resolv.conf file it does not seem to be used; rather, the OS has its own way of DNS resolution and scutil --dns lets you see what is configured. (I am very very sketchy on the details and to be honest I didn’t make much of an effort to figure out the details either). In my case the output of this command showed that the VPN provided resolver for my custom domain was being seen by scutil and yet it wasn’t being used – no idea why.

I would like to point out this post though that shows how one can use scutil to override the DHCP or VPN assigned DNS servers with another. Good to know the kind of things scutil can do.

And while on this confusing topic it is worth pointing out that tools like nslookup and dig use the resolver provided in /etc/resolv.conf so these are not good tools if you want to test what an average Mac OS program might be resolving a particular name to. Best to just ping and see what IP a name resolves to.

Anyways, I didn’t want to go down a scripting route like in that nice blog post so I tried to find an alternative.

Oh, almost forgot! Scoped queries. If you check out this SuperUser post you can see the output of scutil --dns and come across the concept of scoped queries. The idea (I think) is that you can say domain xyz.com should be resolved using a particular name server, domain abc.com should be resolved via another, and so on. From that post I also got the impression you can scope it per interface … so the idea would be that you can scope the name server for my VPN interface to be one, while the name server for my other interfaces to be another. But this wasn’t working in my case (or I had configured something wrong – I dunno. I am a new Mac OS user). Here was my output btw so you can see my Azure hosted domain rakhesh.net has its own name server, while my home domain rakhesh.local has its own (and don’t ask me where the name server for general Internet queries is picked up from … I have no idea!).

Anyways, here’s a link to scutil for my future reference. And story 1 and story 2 on mDNSResponder, which seems to be the DNS resolver in Mac OS. And while on mDNSResponder, if you want to flush you local DNS cache you can do the following (thanks to this help page):

What a mouthful! :)

Also, not related to all this, but something I had to Google on as I didn’t know how to view the routing table in Mac OS. If you want to do the same then netstat -nr is your friend.

Ok, so going back to my problem. I was reading the resolver(5) man page and came across the following:

Mac OS X supports a DNS search strategy that may involve multiple DNS resolver clients.

Each DNS client is configured using the contents of a single configuration file of the format described below, or from a property list supplied from some other system configuration database. Note that the /etc/resolv.conf file, which contains configuration for the default (or “primary”) DNS resolver client, is maintained automatically by Mac OS X and should not be edited manually. Changes to the DNS configuration should be made by using the Network Preferences panel.

Mac OS X uses a DNS search strategy that supports multiple DNS client configurations. Each DNS client has its own set of nameserver addresses and its own set of operational parameters. Each client can perform DNS queries and searches independent of other clients. Each client has a symbolic name which is of the same format as a domain name, e.g. “apple.com”. A special meta-client, known as the “Super” DNS client acts as a router for DNS queries. The Super client chooses among all available clients by finding a best match between the domain name given in a query and the names of all known clients.

Queries for qualified names are sent using a client configuration that best matches the domain name given in the query. For example, if there is a client named “apple.com”, a search for “www.apple.com” would use the resolver configuration specified for that client. The matching algorithm chooses the client with the maximum number of matching domain components. For example, if there are clients named “a.b.c”, and “b.c”, a search for “x.a.b.c” would use the “a.b.c” resolver configuration, while a search for “x.y.b.c” would use the “b.c” client. If there are no matches, the configuration settings in the default client, generally corresponding to the /etc/resolv.conf file or to the “primary” DNS configuration on the system are used for the query.

If multiple clients are available for the same domain name, the clients ordered according to a search_order value (see above). Queries are sent to these resolvers in sequence by ascending value of search_order.

The configuration for a particular client may be read from a file having the format described in this man page. These are at present located by the system in the /etc/resolv.conf file and in the files found in the /etc/resolver directory. However, client configurations are not limited to file storage. The implementation of the DNS multi-client search strategy may also locate client configurations in other data sources, such as the System Configuration Database. Users of the DNS system should make no assumptions about the source of the configuration data.

If I understand this correctly, what it is saying is that:

  1. The settings defined in /etc/resolv.conf is kind of like the fall-back/ default?
  2. Each domain (confusingly referred to as “client”) in the man-page can have its own settings. You define these as files in /etc/resolver/. So I could have a file called /etc/resolver/google.com that defines how I want the “google.com” domain to be resolved – what name servers to use etc. (these are the typical options one finds in /etc/resolv.conf).
  3. The system combines all these individual definitions, along with dynamically created definitions such as when a VPN is established (or any DHCP provided definitions I’d say, including wired and wireless) into a configuration database. This is what scutil can query and manipulate.

What this means for me though is that I can create a file called /etc/resolvers/rakhesh.net (my Azure domain is rakhesh.net) with something like these:

Thus any requests for rakhesh.net will go via this name server. When I am not connected to VPN these requests will fail as the DNS server is not reachable, but when connected it will work fine.

What if I want to take this one step further though? As in I want DNS requests for rakhesh.net to go to its proper external DNS server when I am not on VPN but go via the internal DNS server when I am on VPN? That too is possible. All I have to do is have multiple files – since I can’t call all of them /etc/resolvers/rakhesh.net – and within each specify the domain name via the domain parameter and also define the preference via a search_order parameter. The one with the lower number gets tried first.

So I now have two files. For internal queries I have /etc/resolvers/rakhesh.net.azure (the name doesn’t matter):

For external queries I have /etc/resolvers/rakhesh.net.inet:

The internal file has higher priority. I also added a timeout of 5 seconds so it doens’t spend too much time trying to contact the name server if the VPN is not connected. Easy peasy. This way my queries work via the internal DNS servers if I am connected to VPN, and via external DNS servers if I am not on VPN.

If I now look at the output of scutil --dns I see all this info captured:

So that’s it. Hope this helps someone!

 

DNS SRV records used by AD

Just thought I’d put these here for my own easy reference. I keep forgetting these records and when there’s an issue I end up Googling and trying to find them! These are DNS records you can query to see if clients are able to lookup the PDC, GC, KDC, and DC of the domain you specify via DNS. If this is broken nothing else will work. :)

PDC _ldap._tcp.pdc._msdcs.<DnsDomainName>
GC _ldap._tcp.gc._msdcs.<DnsDomainName>
KDC _kerberos._tcp.dc._msdcs.<DnsDomainName>
DC _ldap._tcp.dc._msdcs.<DnsDomainName>

You would look this up using nslookup -type=SRV <Record>.

As a refresher, SRV records are of the form _Service._Proto.Name TTL Class SRV Priority Weight Port Target. The _Service._Proto.Name is what we are looking up above, just that our name space is _msdcs.<DnsDomainName>.

Service SIDs etc.

Just so I don’t forget. 

The SCOM Agent on a server is called “Microsoft Monitoring Agent”. The short service name is “HealthService” and is set to run as Local System (NT Authority\System). Although not used by default, this service also has a virtual account created automatically by Windows called “NT SERVICE\HealthService” (this was a change introduced in Server 2008). 

As a refresher to myself and any others – this is a virtual account. – i.e. a local account managed by Windows and one which we don’t have much control over (like change the password etc). All services, even though they may be set to run under Local System can also run in a restricted mode under an automatically created virtual account “NT Service\<ServiceName>”. As with Local System, when a service running under such an account accesses a remote system it does so using the credentials of the machine it is running on – i.e. “<DomainName>\<ComputerName>$“.

Since these virtual accounts correspond to a service, and each virtual account has a unique SID, such virtual accounts are also called service SIDs. 

Although all services have a virtual account, it is not used by default. To see whether a virtual account is used or not one can use the sc qsidtype command. This queries the type of the SID of the virtual account. 

A type of NONE as in the above case means this virtual account is not used by the service. If we want a service to use its virtual account we must change this type to “Unrestricted” (or one could set it to “Restricted” too which creates a “write restricted” token – see this and this post to understand what that means). 

The sc sidtype command can be used to change this. 

A service SID is of the form S-1-5-80-{SHA1 hash of short service name}. You can find this via the sc showsid command too:

Note the status “Active”? That’s because I ran the above command after changing the SID type to “Unrestricted”. Before that, when the service SID wasn’t being used, the status was “Inactive”. 

So why am I reading about service SIDs now? :) It’s because I am playing with SCOM and as part of adding one of our SQL servers to it for monitoring I started getting alerts like these:

I figured this would be because the account under which the Monitoring Agent runs has no permissions to the SQL databases, so I looked at RunAs accounts for SQL and came across this blog post. Apparently the in thing nowadays is to change the Monitoring Agent to use a service SID and give that service SID access to the databases. Neat, eh! :)

I did the first step above – changing the SID type to “Unrestricted” so the Monitoring Agent uses that service SID. So next step is to give it access to the databases. This can be done by executing the following in SQL Management Studio after connecting to the SQL server in question:

The comments explain what it does. And yes, it gives the “NT Service\HealthService” service SID admin rights to the server. I got this code snippet from this KB article but the original blog post I was reading has a version which gives minimal rights (it has some other cool goodies too, like a task to create this automatically). I was ok giving this service SID admin rights. 

Creating an OMS tile for computer online/ offline status

This is by no means a big deal, nor am I trying to take credit. But it is something I setup a few days ago and I was pleased to see it in action today, so wanted to post it somewhere. :)

So as I said earlier I have been reading up on Azure monitoring these past few days. I needed something to aim towards and this was one of the things I tried out.

When you install the “Agent Health” solution it gives a tile in the OMS home page that shows the status of all the agents – basically their offline/ online status based on whether an agent is responsive or not.

The problem with this tile is that it only looks for servers that are offline for more than 24 hours! So it is pretty useless if a server went down say 10 mins ago – I can keep staring at the tile for the whole day and that server will not pop up.

I looked at creating something of my own and this is what I came up with –

If you click on the tile it shows a list of servers with the offline ones on top. :)

I removed the computer names in the screenshot that’s why it is blank.

So how did I create this?

I went into View Designer and added the “Donut” as my overview tile. 

Changed the name to “Agent Status”. Left description blank for now. And filled the following for the query:

Here’s what this query does. First it collects all the Heartbeat events. These are piped to a summarize operator. This summarizes the events by Computer name (which is an attribute of each event) and for each computer it computes a new attribute called LastSeen which is the maximum TimeGenerated timestamp of all its events. (You need to summarize to do this. The concept feels a bit alien to me and I am still getting my head around it. But I am getting there).

This summary is then piped to an extend operator which adds a new attribute called Status. (BTW attributes can also be thought of as columns in a table. So each event is a row with the attributes corresponding to columns). This new attribute is set to Offline or Online depending on whether the previously computed LastSeen was less than 15 mins or not.

The output of this is sent to another summarize who now summarizes it by Status with a count of the number of events of each time.

And this output is piped to an order to sort it in descending. (I don’t need it for this overview tile but I use the same query later on too so wanted to keep it consistent).

All good? Now scroll down and change the colors if you want to. I went with Color1 = #008272 (a dark green) and Color 2 = #ba141a (a dark red).

That’s it, do an apply and you will see the donut change to reflect the result of the query.

Now for the view dashboard – which is what you get when someone clicks the donut!

I went with a “Donut & list” for this one. In the General section I changed Group Title to “Agent Status”, in the Header section I changed Title to “Status”, and in the Donut section I pasted the same query as above. Also changed the colors to match the ones above. Basically the donut part is same as before because you want to see the same output. It’s the list where we make some changes.

In the List section I put the following query:

Not much of a difference from before, except that I don’t do any second summarizing. Instead I sort it by the LastSeen attribute after rounding it up to 1 min. This way the oldest heartbeat event comes up on top – i.e. the server that has been offline for the longest. In the Computer Titles section I changed the Name to “Computer” and Value to “Last Seen”. I think there is some way to add a heading for the Offline/Online column too but I couldn’t figure it out. Also, the Thresholds feature seemed cool – would be nice if I could color the offline ones red for instance, but I couldn’t figure that out either.

Lastly I changed the click-through navigation action to be “Log Search” and put the following:

This just gives a list of computers that have been offline for more than 15 mins. I did this because the default action tries to search on my Status attribute and fails; so thought it’s best I put something of my own.

And that’s it really! Like I said no biggie, but it’s my first OMS tile and so I am proud. :)

ps. This blog post brought to you by the Tamil version of the song “Move Your Body” from the Bollywood movie “Johnny Gaddar” which for some reason has been playing in my head ever since I got home today. Which is funny coz that movie is heavily inspired by the books of James Hadley Chase and I was searching for his books at Waterstones when I was in London a few weeks ago (and also yesterday online).

[Aside] Various Azure links

My blog posting has taken a turn for the worse. Mainly coz I have been out of country and since returning I am busy reading up on Azure monitoring.

Anyways, some quick links to tabs I want to close now but which will be useful for me later –

  • A funny thing with Azure monitoring (OMS/ Log Analytics) is that it can’t just do simple WMI queries against your VMs to check if a service is running. Crazy, right! So you have to resort to tricks like monitor the event logs to see any status messages. Came across this blog post with a neat idea of using performance counters. I came across that in turn from this blog post that has a different way of using the event logs.
  • We use load balancers in Azure and I was thinking I could tap into their monitoring signals (from the health probes) to know if a particular server/ service is up or down. In a way it doesn’t matter if a particular server/ service is down coz there won’t be a user impact coz of the load balancer, so what I am really interested in knowing is whether a particular monitored entity (from the load balancer point of view) is down or not. But turns out the basic load balancer cannot log monitoring signals if it is for internal use only (i.e. doesn’t have a public IP). You either need to assign it a public IP or use the newer standard load balancer.
  • Using OMS to monitor and send alert for BSOD.
  • Using OMS to track shutdown events.
  • A bit dated, but using OMS to monitor agent health (has some queries in the older query language).
  • A useful list of log analytics query syntax (it’s a translation from old to new style queries actually but I found it a good reference)

Now for some non-Azure stuff which I am too lazy to put in a separate blog post:

  • A blog post on the difference between application consistent and crash consistent backups.
  • At work we noticed that ADFS seemed to break for our Windows 10 machines. I am not too clear on the details as it seemed to break with just one application (ZScaler). By way of fixing it we came across this forum post which detailed the same symptoms as us and the fix suggested there (Set-ADFSProperties -IgnoreTokenBinding $True) did the trick for us. So what is this token binding thing?
    • Token Binding seems to be like cookies for HTTPS. I found this presentation to be a good explanation of it. Basically token binding binds your security token (like cookies or ADFS tokens) to the TLS session you have with a server, such that if anyone were to get hold of your cookie and try to use it in another session it will fail. Your tokens are bound to that TLS session only. I also found this medium post to be a good techie explanation of it (but I didn’t read it properly*). 
    • It seems to be enabled on the client side from Windows 10 1511 and upwards.
    • I saw the same recommendation in these Microsoft Docs on setting up Azure stack.

Some excerpts from the medium post (but please go and read the full one to get a proper understanding). The excerpt is mostly for my reference:

Most of the OAuth 2.0 deployments do rely upon bearer tokens. A bearer token is like ‘cash’. If I steal 10 bucks from you, I can use it at a Starbucks to buy a cup of coffee — no questions asked. I do not want to prove that I own the ten dollar note.

OAuth 2.0 recommends using TLS (Transport Layer Security) for all the interactions between the client, authorization server and resource server. This makes the OAuth 2.0 model quite simple with no complex cryptography involved — but at the same time it carries all the risks associated with a bearer token. There is no second level of defense.

OAuth 2.0 token binding proposal cryptographically binds security tokens to the TLS layer, preventing token export and replay attacks. It relies on TLS — but since it binds the tokens to the TLS connection itself, anyone who steals a token cannot use it over a different channel.

Lastly, I came across this awesome blog post (which too I didn’t read properly* – sorry to myself!) but I liked a lot so here’s a link to my future self – principles of token validation.

 

* I didn’t read these posts properly coz I was in a “troubleshooting mode” trying to find out why ADFS broke with token binding. If I took more time to read them I know I’d get side tracked. I still don’t know why ADFS broke, but I have an idea.

Asus RT-AC68U router, firmware, etc. (contd.)

Continuing a previous post of mine as a note to myself.

Tried to flash my Asus RT-AC68U with the Advanced Tomato firmware and that was a failed attempt. The router just kept rebooting. Turns out Advanced Tomato doesn’t work on the newer models. Bummer! Not that I particularly wanted Advanced Tomato. It looked good and I wanted to try it out, that’s all. Asus Merlin suits me just fine.

Quick shout out to “Yet another malware block script” which I’ve now got running on the Asus RT-AC68U. And I also came across and have installed AB-Solution which seems to be the equivalent of Pi-Hole but for routers. I got rid of Pi-Hole yesterday as I moved the Asus back to being my primary router (replacing the ISP provided one) and I didn’t want to depend on a separate machine for DNS etc. I wanted the Asus to do everything, including ad-blocking via DNS, so Googled on what alternatives are there for Asus and came across AB-Solution. Haven’t explored it much except for installing it. Came across it via this post.

That’s all for now!

As an aside, I feel so outdated using Linux nowadays. :( The last time I used Linux was 4-5 years ago – Debian and Fedora etc. Now most of the commands I am used to from those times don’t work any more. Even simple stuff like ifconfig or route print. It’s all System D based now. I had to reconfigure the IP address of this Debian VM where I installed Pi-Hole and I thought I could do it but for some reason I didn’t manage. (And no I didn’t read the docs! :p)

This is not to blame Linux or System D or progress or anything like that. Stuff changes. If I was used to Windows 2003 and came across Windows 2008 I’d be unused to it’s differences too – especially in the command line. Similarly from Server 2008 to 2012. It’s more a reflection of me being out of touch with Linux and now too lazy to try and get back on track. :)

HPE Synergy and eFuse Reset

In the HPE BladeSystem c7000 Enclosures one can do something called an eFuse reset to power cycle any the server blades. I have blogged about it previously here.

Now we are on the HPE Synergy 12000 Frames at work and I wanted to do something similar. One of the compute modules (aka server :p) was complaining that the server profile couldn’t be applied due to some errors. The compute module was off and refusing to power on, so it looked like there was nothing we could do short of removing it from the frame and putting back. I felt an eFuse reset would do the trick here – it does the same after all.

I couldn’t find any way of doing this via an SSH into the frame’s OneView (which is the equivalent of the Onboard Administrator in a c7000 Enclosure) but then found this PowerShell library from HPE. Now that is pretty cool! Here’s a wiki page too with all the cmdlets – a good page to bookmark and keep handy. Using this I was able to power cycle the compute module.

1) Install the library following instructions in the first link.

2) Login.

3) Get a list of the modules in the enclosure (not really required but I did anyways to confirm the PowerShell view matches my expectations).

4) Now assign the enclosure object containing the module I want to reset to a variable. We need this for the next step.

In my case the Synergy 12000 Frame (capital “F”) is made up of two frame enclosures. (The frame enclosure is where you have the compute modules and interconnects and frame link modules etc).  The module I want to reset is in bay 1 of frame 2. So below I assign the frame 2 object to a variable.

5) Now do the actual eFuse reset.

The -Component parameter can take as argument Device (for compute modules), FLM (for Frame Link Modules), ICM (for InterConnect Modules), and Appliance (for the Synergy Composer or Image Streamer). The -DeviceID parameter is the bay number for the type of component we are trying to reset (so -Component Device -DeviceID 1 is not the same as -Component ICM -DeviceID 1).

An eFuse reset is optional. You could do a simple reset too by skipping the -Efuse switch. The Appliance and ICM components only do eFuse reset though. I am not sure what a regular (non eFuse) reset does.

[Aside] Web Servers

I came across these recently and wanted to put them here as a bookmark to myself.

  • h5ai – A modern file browsing UI for web server. Looks amazing!
  • HFS – HTTP File Server. It’s a web server and also a way to send and receive files over HTTP. I haven’t used it by my colleagues recently did.
  • Fenix – A web server you can run on your desktop or laptop. Looks nice too!
  • TinyWeb – A very tiny web server you can run on your desktop or laptop.
  • Caddy – an HTTP/2 web server with automatic HTTPS. Got to check it out sometime.

Asus RT-AC68U router, firmware, etc.

Bought an Asus RT-AC68U router today. I didn’t like my existing D-Link much and a colleague bought the Asus and was all praises so I thought why not try that.

Was a bit put off that many of the features (especially the parental control ones) seem to be tied up with a Trend Micro service that’s built into the router. When you enable these you get an EULA agreement from Trend Micro, and while I usually just click EULA agreements this one caught my eye coz it said somewhere that Asus takes no responsibility for any actions of Trend Micro and so they pretty much wash their hands off whatever Trend Micro might do once you sign up for it. That didn’t sound very nice. I mean, yes, I knew the router had some Trend Micro elements in it, and I have used Trend Micro in the past and have no beef with them, but I bought an Asus router and I expect them to take responsibility for whatever they put in the box.

Anyways, Googling about it I found some posts like this, this, and this that echoed similar sentiments and put me off. It was upsetting as a lot of value I was hoping to get out of the router was centered around using Trend Micro, and since I didn’t want to accept the EULA I would never be able to use it.

I briefly thought of flashing some other firmware in the hopes that that will give me more feature. Advanced Tomato looks nice, but then I came across Asus WRT Merlin which seems to be based on the official firmware but with some additional features and bug fixes and a focus on performance and safety rather than new features. (Also, the official Asus firmware and also the Merlin one have hardware NAT acceleration and proprietary NTFS drivers that offer better performance, while other third party firmware don’t have this. The hardware NAT only matters if your WAN connection is > 100Mbps, which wasn’t so in my case). Asus WRT Merlin looks good. The UI is same as the official one, and it appears that the official firmware has slowly embraced many of the newer features of Merlin. Also, this discussion from the creator of the Merlin firmware on the topic of Trend Micro was good too. Wasn’t as doom and gloom like the others (but I still haven’t enabled the Trend Micro stuff nor do I plan on doing so).

The Merlin firmware is amazing. Flashing it is easy, and it gives some nifty new features. For example you can have custom config files that extend the inbuilt DHCP/ DNS server dnsmasq, have other 3rd party software, and so on. This official Wiki page is a good read. I came across this malware blocking script and installed it. I also made some changes to DHCP so that certain machines get different DNS servers (e.g. point my daughter’s machine to use the Yandex.DNS). Here’s a bit from my config file in case it helps –

This dnsmasq manpage was helpful, so was this page of examples. Also this StackOverflow post.

I liked this idea of having separate DHCP options for specific SSIDs, and also this one of having a separate SSID that’s connected to VPN (nice!). I wanted to try these but was feeling lazy so didn’t get around to doing it. I read a lot about it though and liked this post on having separate VLANs within the router. That post also explains the port numbering etc. of the router – its a good read. I also wanted to see if it was possible to have a separate VLAN for an SSID – lets say have all my visitors connect to a different SSID with its own VLAN and IP range etc. I know I can do the IP range and stuff but looks like if I need to do a separate VLAN I’ll have to give up one of the four ports on the back of the router. Basically the way things seem to be setup are that the 5 ports on the back of the router are part of the same switch, just that the WAN port is in its own VLAN 2 while the LAN ports are in their own VLAN 1.  The WLAN (Wireless) are bridged to this VLAN 1. So if you want a separate WLAN SSID with its own VLAN, we must create a new VLAN on one of the four ports and bridge the new SSID to that.

In the above port 0 is the WAN, port 1-4 are the LAN ports, and port 5 is the router itself (the SOC on the router). Since port 5 is part of both VLANs the router can route between them. The port numbers vary per model. Here’s a post showing what the above output might look like in such a case. As a reference to myself this person was trying to do something similar (I didn’t read all the posts so there could be stuff I missed in there).

Lastly these two wiki pages from DD-WRT Wiki are worth referring to at some point – on the various ports, and multiple WLANs.

At some point, when I am feeling less lazy, I must fiddle around with this router a bit more. It’s fun, reminds me of my younger days with Linux. :)

[Aside] How to convert a manually added AD site connection to an automatically generated one

Cool tip via a Microsoft blog post. If you have a connection object in your AD Sites and Services that was manually created and you now want to switch over to letting KCC generate the connection objects instead of using the manual one, the easiest thing to do is convert the manually created one to an automatic one using ADSI Edit.

1.) Open ADSI Edit and go to the Configuration partition.

2.) Drill down to Sites, the site where the manual connection object is, Servers, the server where the manual connection object is created, NTDS Settings

3.) Right click on the manual connection object and go to properties

4.) Go to the Options attribute and change it from 0 to 1 (if it’s an RODC, then change it from 64 to 65)

5.) Either wait 15 minutes (that’s how often the KCC runs) or run repadmin /kcc to manually kick it off

While on that topic, here’s a blog post to enable change notifications on manually created connections. More values for the options attribute in this spec document.

Also, a link to myself on the TechNet AD Replication topology section of Bridge All Site Links (BASL). Our environment now has a few sites that can’t route to all the other sites so I had to disable BASL today and was reading up on it.