Was looking at setting up monitoring of my ADFS servers on NSX.
I know what to monitor on the ADFS and WAP servers thanks to this article.
http://<Web Application Proxy name>/adfs/probe
http://<ADFS server name>/adfs/probe
http://<Web Application Proxy IP address>/adfs/probe
http://<ADFS IP address>/adfs/probe
Need to get an HTTP 200 response for these.
So I created a service monitor in NSX along these lines:
And I associated it with my pool:
Bear in mind the monitor has to check port 80, even though my pool might be on port 443, so be sure to change the monitor port as above.
The “Show Pool Statistics” link on the “Pools” section quickly tells us whether the member servers are up or not:
The show service loadbalancer pool
command can be used to see what the issue is in case the monitor appears down. Here’s an example when things aren’t working:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
NSX-edge-15-1> show service loadbalancer pool pool-adfs-https_443 ----------------------------------------------------------------------- Loadbalancer Pool Statistics: POOL pool-adfs-https_443 | LB METHOD round-robin | LB PROTOCOL L7 | Transparent disabled | SESSION (cur, max, total) = (0, 0, 0) | BYTES in = (0), out = (0) +->POOL MEMBER: pool-adfs-https_443/ADFS01, STATUS: DOWN | | HEALTH MONITOR = MONITOR SERVICE, default_adfs_monitor:CRITICAL | | | LAST STATE CHANGE: 2018-02-20 10:02:13 | | | LAST CHECK: 2018-02-20 12:47:15 | | | FAILURE DETAIL: HTTP CRITICAL - Invalid HTTP response received from host: HTTP/1.1 400 Bad Request | | SESSION (cur, max, total) = (0, 0, 0) | | BYTES in = (0), out = (0) |
Here’s an example when all is well:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
NSX-edge-15-1> show service loadbalancer pool pool-adfs-https_443 ----------------------------------------------------------------------- Loadbalancer Pool Statistics: POOL pool-adfs-https_443 | LB METHOD round-robin | LB PROTOCOL L7 | Transparent disabled | SESSION (cur, max, total) = (0, 0, 0) | BYTES in = (0), out = (0) +->POOL MEMBER: pool-adfs-https_443/ADFS01, STATUS: UP | | HEALTH MONITOR = BUILT-IN, default_adfs_monitor:L7OK | | | LAST STATE CHANGE: 2018-02-20 12:51:02 | | SESSION (cur, max, total) = (0, 0, 0) | | BYTES in = (0), out = (0) |
Thanks to this document for pointing me in the right troubleshooting direction. Quoting from that document, the list of error codes:
UNK: Unknown
INI: Initializing
SOCKERR: Socket error
L4OK: Check passed on layer 4, no upper layers testing enabled
L4TOUT: Layer 1-4 timeout
L4CON: Layer 1-4 connection problem. For example, “Connection refused” (tcp rst) or “No route to host” (icmp)
L6OK: Check passed on layer 6
L6TOUT: Layer 6 (SSL) timeout
L6RSP: Layer 6 invalid response – protocol error. May caused as the:
Backend server only supports “SSLv3” or “TLSv1.0”, or
Certificate of the backend server is invalid, or
The cipher negotiation failed, and so on
L7OK: Check passed on layer 7
L7OKC: Check conditionally passed on layer 7. For example, 404 with disable-on-404
L7TOUT: Layer 7 (HTTP/SMTP) timeout
L7RSP: Layer 7 invalid response – protocol error
L7STS: Layer 7 response error. For example, HTTP 5xx
Nice!