Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

NSX Edge application rules to use a different pool

Coming from a NetScaler background I was used to the concept of a failover server. As in a virtual server would have a pool of servers it would load balance amongst and if all of them are down I can define a failover server that could be used. You would define the failover server as a virtual server with no IP, and tell the primary virtual server to failover to this virtual server in case of issues.

Looking around for a similar option with NSX I discovered it’s possible using application rules. Instead of defining two virtual servers though, here you define two pools. One pool for the primary servers you want to load balance, the other pool for the failover server(s).

Then you create an application rule this:

Once again, the syntax is that of HAProxy. You define an ACLadfs_pri_down is what I am defining for my purposes as this is for load balancing some ADFS servers – and the criterion is nbsrv(pool-adfs-https-443) eq 0. The nbsrv criterion checks the pool you pas on to it (pool-adfs-https-443 in my case) and returns the number of servers that are up. So the ACL basically is a boolean one that is true if the number of usable servers is 0.

Next, the use_backend rule switches to using the backup pool I have defined (pool-bkpadfs-https-443 in this case) if the ACL is true.

That’s all. Pretty straightforward!

[Aside] Various ADFS links

No biggie, just as a reference to myself:

Update 16 July 2018: Needed to make a claim rule yesterday that converted the email address from an incoming claim to Name ID of an outgoing claim. The default GUI provided rule didn’t work, so I made a custom one:

I think I’ll add more such snippets here later.

Btw note to self: custom claim rules are useful if you want to combine multiple incoming claims – i.e. for an AND operation. If you don’t want to combine – i.e. you want to OR multiple claims – just add them as separate rules.

[Aside] Various Azure links

My blog posting has taken a turn for the worse. Mainly coz I have been out of country and since returning I am busy reading up on Azure monitoring.

Anyways, some quick links to tabs I want to close now but which will be useful for me later –

  • A funny thing with Azure monitoring (OMS/ Log Analytics) is that it can’t just do simple WMI queries against your VMs to check if a service is running. Crazy, right! So you have to resort to tricks like monitor the event logs to see any status messages. Came across this blog post with a neat idea of using performance counters. I came across that in turn from this blog post that has a different way of using the event logs.
  • We use load balancers in Azure and I was thinking I could tap into their monitoring signals (from the health probes) to know if a particular server/ service is up or down. In a way it doesn’t matter if a particular server/ service is down coz there won’t be a user impact coz of the load balancer, so what I am really interested in knowing is whether a particular monitored entity (from the load balancer point of view) is down or not. But turns out the basic load balancer cannot log monitoring signals if it is for internal use only (i.e. doesn’t have a public IP). You either need to assign it a public IP or use the newer standard load balancer.
  • Using OMS to monitor and send alert for BSOD.
  • Using OMS to track shutdown events.
  • A bit dated, but using OMS to monitor agent health (has some queries in the older query language).
  • A useful list of log analytics query syntax (it’s a translation from old to new style queries actually but I found it a good reference)

Now for some non-Azure stuff which I am too lazy to put in a separate blog post:

  • A blog post on the difference between application consistent and crash consistent backups.
  • At work we noticed that ADFS seemed to break for our Windows 10 machines. I am not too clear on the details as it seemed to break with just one application (ZScaler). By way of fixing it we came across this forum post which detailed the same symptoms as us and the fix suggested there (Set-ADFSProperties -IgnoreTokenBinding $True) did the trick for us. So what is this token binding thing?
    • Token Binding seems to be like cookies for HTTPS. I found this presentation to be a good explanation of it. Basically token binding binds your security token (like cookies or ADFS tokens) to the TLS session you have with a server, such that if anyone were to get hold of your cookie and try to use it in another session it will fail. Your tokens are bound to that TLS session only. I also found this medium post to be a good techie explanation of it (but I didn’t read it properly*). 
    • It seems to be enabled on the client side from Windows 10 1511 and upwards.
    • I saw the same recommendation in these Microsoft Docs on setting up Azure stack.

Some excerpts from the medium post (but please go and read the full one to get a proper understanding). The excerpt is mostly for my reference:

Most of the OAuth 2.0 deployments do rely upon bearer tokens. A bearer token is like ‘cash’. If I steal 10 bucks from you, I can use it at a Starbucks to buy a cup of coffee — no questions asked. I do not want to prove that I own the ten dollar note.

OAuth 2.0 recommends using TLS (Transport Layer Security) for all the interactions between the client, authorization server and resource server. This makes the OAuth 2.0 model quite simple with no complex cryptography involved — but at the same time it carries all the risks associated with a bearer token. There is no second level of defense.

OAuth 2.0 token binding proposal cryptographically binds security tokens to the TLS layer, preventing token export and replay attacks. It relies on TLS — but since it binds the tokens to the TLS connection itself, anyone who steals a token cannot use it over a different channel.

Lastly, I came across this awesome blog post (which too I didn’t read properly* – sorry to myself!) but I liked a lot so here’s a link to my future self – principles of token validation.

 

* I didn’t read these posts properly coz I was in a “troubleshooting mode” trying to find out why ADFS broke with token binding. If I took more time to read them I know I’d get side tracked. I still don’t know why ADFS broke, but I have an idea.

NSX Edge application rules to limit HTTP

Not a big deal, but something I tried today mainly as an excuse to try this with NSX.

So here’s the situation. I have a pair of ADFS WAP servers that are load balanced using NSX. ADFS listens on port 443 so that’s the only port my VIP needs to handle.

However, we are also using Verisign DNS to failover between sites. So I want it such that if say both the ADFS servers in Site A are down, then Verisign DNS should failover my external ADFS records to Site B. Setting this up in Verisign DNS is easy but you need to be able to monitor the ADFS / WAP services externally from Verisign for this to work properly. Thus I had to setup my NSX WAP load balancer to listen on port 80 too and forward those to my internal ADFS servers. To monitor the health of ADFS Verisign will periodically query http://myadfsserver/adfs/probe. If that returns a 200 response all is good.

Now here’s the requirement I came up with for myself. I don’t want any and every HTTP query on port 80 to be handled. Yes, if I try http://myadfserver/somerandomurl it gives a 404 error but I don’t want that. I want any HTTP queries to any URL except /adfs/probe to be redirected to /adfs/probe. Figured this would be a good place to use application rules. Here’s what I came up with:

NSX application rules follow the same format as HAProxy so their config guide is a very handy reference.

The acl keyword defines an Access Control List. In my case the name of this ACL is allowed_url (this is a name I can define) and it matches URLs (hence the url keyword). Since I used url it does an exact match, but there are derivatives like url_beg and url_end and url_len and url_dir etc. that I could have used. I can even do regexp matches. In my case I am matching for the exact URL /adfs/probe. I define this as the ACL called allowed_url.

In the next line I use the redirect keyword to redirect any requests to the /adfs/probe location if it does not match the allowed_url ACL. That’s all!

This of course barely touches what one can do with application rules, but I am pleased to get started with this much for now. :)

ADFS monitoring on NSX

Was looking at setting up monitoring of my ADFS servers on NSX.

I know what to monitor on the ADFS and WAP servers thanks to this article.

http://<Web Application Proxy name>/adfs/probe
http://<ADFS server name>/adfs/probe
http://<Web Application Proxy IP address>/adfs/probe
http://<ADFS IP address>/adfs/probe

Need to get an HTTP 200 response for these.

So I created a service monitor in NSX along these lines:

And I associated it with my pool:

Bear in mind the monitor has to check port 80, even though my pool might be on port 443, so be sure to change the monitor port as above.

The “Show Pool Statistics” link on the “Pools” section quickly tells us whether the member servers are up or not:

The show service loadbalancer pool command can be used to see what the issue is in case the monitor appears down. Here’s an example when things aren’t working:

Here’s an example when all is well:

Thanks to this document for pointing me in the right troubleshooting direction. Quoting from that document, the list of error codes:

UNK: Unknown

INI: Initializing

SOCKERR: Socket error

L4OK: Check passed on layer 4, no upper layers testing enabled

L4TOUT: Layer 1-4 timeout

L4CON: Layer 1-4 connection problem. For example, “Connection refused” (tcp rst) or “No route to host” (icmp)

L6OK: Check passed on layer 6

L6TOUT: Layer 6 (SSL) timeout

L6RSP: Layer 6 invalid response – protocol error. May caused as the:

Backend server only supports “SSLv3” or “TLSv1.0”, or

Certificate of the backend server is invalid, or

The cipher negotiation failed, and so on

L7OK: Check passed on layer 7

L7OKC: Check conditionally passed on layer 7. For example, 404 with disable-on-404

L7TOUT: Layer 7 (HTTP/SMTP) timeout

L7RSP: Layer 7 invalid response – protocol error

L7STS: Layer 7 response error. For example, HTTP 5xx

Nice!

[Aside] ADFS and Windows Proxy

Quick shout out to this blog post on where to set the Internet Proxy for ADFS. Basically, you gotta set it via netsh winhttp proxy.

Notes on ADFS

I have been trying to read on ADFS nowadays. It’s my new area of interest! :) Wrote a document at work sort of explaining it to others, so here’s bits and pieces from that.

What does Active Directory Federation Services (ADFS) do?

Typically when you visit a website you’d need to login to that website with a username/ password stored on their servers, and then the website will give you access to whatever you are authorized to. The website does two things basically – one, it verifies your identity; and two, it grants you access to resources.

It makes sense for the website to control access, as these are resources with the website. But there’s no need for the website to control identity too. There’s really no need for everyone who needs access to a website to have user accounts and passwords stored on that website. The two steps – identity and access control – can be decoupled. That’s what ADFS lets us do.

With ADFS in place, a website trusts someone else to verify the identity of users. The website itself is only concerned with access control. Thus, for example, a website could have trusts with (say) Microsoft, Google, Contoso, etc. and if a user is able to successfully authenticate with any of these services and let the website know so, they are granted access. The website itself doesn’t receive the username or password. All it receives are “claims” from a user.

What are Claims?

A claim is a statement about “something”. Example: my username is ___, my email address is ___, my XYZ attribute is ___, my phone number is ____, etc.

When a website trusts our ADFS for federation, users authenticate against the ADFS server (which in turn uses AD or some other pool to authenticate users) and passes a set of claims to the website. Thus the website has no info on the (internal) AD username, password, etc. All the website sees are the claims, using which it can decide what to do with the user.

Claims are per trust. Multiple applications can use the same trust, or you could have a trust per application (latter more likely).

All the claims pertaining to a user are packaged together into a secure token.

What is a Secure Token?

A secure token is a signed package containing claims. It is what an ADFS server sends to a website – basically a list of claims, signed with the token signing certificate of the ADFS server. We would have sent the public key part of this certificate to the website while setting up the trust with them; thus the website can verify our signature and know the tokens came from us.

Relying Party (RP) / Service Provider (SP)

Refers to the website/ service who is relying on us. They trust us to verify the identity of our users and have allowed access for our users to their services.

I keep saying “website” above, but really I should have been more generic and said Relying Party. A Relying Party is not limited to a website, though that’s how we commonly encounter it.

Note: Relying Party is the Microsoft terminology.

ADFS cannot be used for access to the following:

  • File shares or print servers
  • Active Directory resources
  • Exchange (O365 excepted)
  • Connect to servers using RDP
  • Authenticate to “older” web applications (it needs to be claims aware)

A Relying Party can be another ADFS server too. Thus you could have a setup where a Replying Party trusts an ADFS service (who is the Claims Provider in this relationship), and the ADFS service in turn trusts a bunch of other ADFS servers depending on (say) the user’s location (so the trusting ADFS service is a Relying Party in this relationship).

Claims Provider (CP) / Identity Provider (IdP)

The service that actually validates users and then issues tokens. ADFS, basically.

Note: Claims Party is the Microsoft terminology.

Secure Token Service (STS)

The service within ADFS that accepts requests and creates and issues security tokens containing claims.

Claims Provider Trust & Relying Party Trust

Refers to the trust between a Relying Party and Identity Provider. Tokens from the Identity Provider will be signed with the Identity Provider’s token signing key – so the Relying Party knows it is authentic. Similarly requests from the Relying Party will be signed with their certificate (which we can import on our end when setting up the trust).

Examples of setting up Relying Party Trusts: 1 and 2.

Claims Provider Trust is the trust relationship a Relying Party STS has with an Identity Provider STS. This trust is required for the Relying Party STS to accept incoming claims from the Identity Provider STS.

Relying Party Trust is the trust relationship an Identity Provider STS has with a Relying Party STS. This trust is requires for the Identity Provider STS to send claims to the Relying Party STS. 

Web Application Proxy (WAP)

Access to an ADFS server over the Internet is via a Web Application Proxy. This is a role in Server 2012 and above – think of it as a reverse proxy for ADFS. The ADFS server is within the network; the WAP server is on the DMZ and exposed to the Internet (at least port 443). The WAP server doesn’t need to be domain joined. All it has is a reference to the ADFS server – either via DNS, or even just a hosts file entry. The WAP server too contains the public certificates of the ADFS server.

Miscellaneous

  • ADFS Federation Metadata – this is a cool link that is published by the ADFS server (unless we have disabled it). It is https://<your-adfs-fqdn>/FederationMetadata/2007-06/FederationMetadata.xml and contains all the info required by a Replying Party to add the ADFS server as a Claims Provider.
    • This also includes Base64 encoded versions of the token signing certificate and token decrypting certificates.
  • SAML Entity ID – not sure of the significance of this yet, but this too can be found in the Federation Metadata file. It is usually of the form http://<your-adfs-fqdn>/adfs/services/trust and is required by the Relying Party to setup a trust to the ADFS server.
  • SAML endpoint URL – this is the URL where users are sent to for authentication. Usually of the form http://<your-adfs-fqdn>/adfs/ls.  This information too can be found in the Federation Metadata file.
  • Link to my post on ADFS Certificates.
  • Link to a nice post explaining most of the above and also about certificates.

[Aside] Misc ADFS links

Update: To test ADFS as an end-user, go to https://<adfsfqdn>/adfs/ls/IdpInitiatedSignon.aspx. Should get a page where you can sign in and select what trusts are present.

ADFS errors and WID

Spent a bit of time today tracking down an ADFS/ WID issue. Turned out to be a silly one in the end (silly on my part actually, should have spotted the cause right away!) but it was a good learning exercise in the end. 

The issue was that ADFS refused to launch after a server reboot. The console gave an error that it couldn’t connect to the configuration database. The ADFS service refused to start and the event logs were filled with errors such as these:

The Federation Service configuration could not be loaded correctly from the AD FS configuration database.

Additional Data
Error:
ADMIN0012: OperationFault

There was an error in enabling endpoints of Federation Service. Fix configuration errors using PowerShell cmdlets and restart the Federation Service.

Additional Data
Exception details:
System.ServiceModel.FaultException`1[Microsoft.IdentityServer.Protocols.PolicyStore.OperationFault]: ADMIN0012: OperationFault (Fault Detail is equal to Microsoft.IdentityServer.Protocols.PolicyStore.OperationFault).

A SQL operation in the AD FS configuration database with connection string Data Source=np:\\.\pipe\microsoft##wid\tsql\query;Initial Catalog=AdfsConfiguration;Integrated Security=True failed.

Additional Data

Exception details:
A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: Named Pipes Provider, error: 40 – Could not open a connection to SQL Server)

The last one repeated many times. 

I hadn’t installed the ADFS server in our firm so I had no clue how it was setup. Importantly, I didn’t know it used the Windows Internal Database (WID) which you can see in the error messages above. It is possible to have ADFS work with SQL for a larger setup but that wasn’t the case here. Following some blog posts on the Internet (this and this) I downloaded SQL Server Management Studio (SSMS) and tried connecting to the WID at the path given in the error (\\.\pipe\microsoft##wid\tsql\query). That didn’t work for me – it just gave me some errors that the SQL server was unreachable. 

BTW, according to one of the blog posts it is better to launch SSMS as the user who has rights to connect to the WID database (the service account under which your ADFS service runs for instance). That didn’t help in my case (not saying the advice is incorrect, my issue was something else). Found a Microsoft blog post too that confirmed I was connecting to the correct server name – \\.\pipe\microsoft##wid\tsql\query for Windows 2012 and above; \\.\pipe\MSSQL$MICROSOFT##SSEE\sql\query for Windows 2003 & 2008 – but no go. 

That’s when I realized the WID has its own service. I had missed this initially. Trying to start that gave an error that it couldn’t start due to a login failure. This service runs under an account NT SERVICE\MSSQL$MICROSOFT##WID and looks like it didn’t have logon as service rights. It looks like someone had played around with our GPOs (or moved this server to a different OU) and this account had lost its rights. 

The fix is simple – just give this account rights via GPO (or exclude the server from whatever GPO is fiddling with logon as a service rights; or move this server to some other OU). Since the NT SERVICE\MSSQL$MICROSOFT##WID is not a regular account, you can’t add it to GPO from any server (because the account is local and will only exist if WID is installed). So I opened GPMC on my ADFS server and modified the GPO to give this account logon as a service rights. 

Notes on ADFS Certificates

Was trying to wrap my head around ADFS and Certificates today morning. Before I close all my links etc I thought I should make a note of them here. Whatever I say below is more or less based on these links (plus my understanding):

There are three types of certificates in ADFS. 

The “Service communications” certificate is also referred to as “SSL certification” or “Server Authentication Certificate”. This is the certificate of the ADFS server/ service itself. 

  • If there’s a farm of ADFS servers, each must have the same certificate
  • We have the private key too for this certificate and can export it if this needs to be added to other ADFS servers in the farm. 
  • The Subject Name must contain the federation service name. 
  • This is the certificate that end users will encounter when they are redirected to the ADFS page to sign-on, so this must be a public CA issued certificate. 

The “Token-signing” certificate is the crucial one

  • This is the certificate used by the ADFS server to sign SAML tokens.
  • We have the private key too this certificate too but it cannot be exported. There’s no option in the GUI to export the private key. What we can do is export the public key/ certificate. 
    • ADFS servers in a farm can share the private key.
    • If the certificate is from a public or enterprise CA, however, then the key can be exported (I think).
  • The exported public certificate is usually loaded on the service provider (or relying party; basically the service where we can authenticate using our ADFS). They use it to verify our signature. 
  • The ADFS server signs tokens using this certificate (i.e. uses its private key to encrypt the token or a hash of the token – am not sure). The service provider using the ADFS server for authentication can verify the signature via the public certificate (i.e. decrypt the token or its hash using the public key and thus verify that it was signed by the ADFS server). This doesn’t provide any protection against anyone viewing the SAML tokens (as it can be decrypted with the public key) but does provide protection against any tampering (and verifies that the ADFS server has signed it). 
  • This can be a self-signed certificate. Or from enterprise or public CA.
    • By default this certificate expires 1 year after the ADFS server was setup. A new certificate will be automatically generated 20 days prior to expiration. This new certificate will be promoted to primary 15 days prior to expiration. 
    • This auto-renewal can be disabled. PowerShell cmdlet (Get-AdfsProperties).AutoCertificateRollover tells you current setting. 
    • The lifetime of the certificate can be changed to 10 years to avoid this yearly renewal. 
    • No need to worry about this on the WAP server. 
    • There can be multiple such certificates on an ADFS server. By default all certificates in the list are published, but only the primary one is used for signing.

The “Token-decrypting” certificate.

  • This one’s a bit confusing to me. Mainly coz I haven’t used it in practice I think. 
  • This is the certificate used if a service provider wants to send us encrypted SAML tokens. 
    • Take note: 1) Sending us. 2) Not signed; encrypted
  • As with “Token-signing” we export the public part of this certificate, upload it with the 3rd party, and they can use that to encrypt and send us tokens. Since only we have the private key, no one can decrypt en route. 
  • This can be a self-signed certificate. 
    • Same renewal rules etc. as the “Token-signing” certificate. 
  • Important thing to remember (as it confused me until I got my head around it): This is not the certificate used by the service provider to send us signed SAML tokens. The certificate for that can be found in the signature tab of that 3rd party’s relaying party trust (see screenshot below). 
  • Another thing to take note of: A different certificate is used if we want to send encrypted info to the 3rd party. This is in the encryption tab of the same screenshot. 

So – just to put it all in a table.

Clients accessing ADFS server Service communications certificate
ADFS server signing data and sending to 3rd party Token-signing certificate
3rd party encrypting data and sending ADFS server Token-decrypting certificate
3rd party signing data and sending ADFS server Certificate in Signature tab of Trust
ADFS server encrypting data and sending to 3rd party Certificate in Encryption tab of Trust

Update: Linking to a later post of mine on ADFS in general.

Update 2: Modification to the above table, using better terminology.

Clients accessing ADFS server Service communications certificate
We (Identity Provider STS) signing data and sending to a Relying Party STS Token-signing certificate
Relying Party STS encrypting data and sending us (Identity Provider STS) Token-decrypting certificate
[In a trust] Relying Party STS party signing data and sending us (Identity Provider STS) Certificate in Signature tab of Trust
[In a trust] We (Identity Provider STS) encrypting data and sending to Relying party STS Certificate in Encryption tab of Trust