Azure Functions and port exhaustions

This is all stuff on the Internet, in the MS docs itself. But I want to make a post of it so I can refer to this easily in the future.

If not already obvious from all the Azure Function posts this week I have been working on something that reads Security Events from an Event Hub and does an HTTP request to a 3rd party to push the logs to them. The Azure Function that does this reads from the Event Hub, selects specific Computers and Event IDs we are interested in, and only pushes those out. So I can’t batch the requests as I have to deal with each entry as soon as I am done with it.

After creating the Azure Function I also setup Application Insights for it. I had no idea what it would offer me, but the Portal suggested I set it up when I went to the Monitor tab and I figured why not. It has some cool things like providing you with an Application map, and more importantly for me a Transaction search function. In my function I spit out some output if things go well or throw an Write-Error in case of errors and I realized I could easily search these logs in this section or even see all the Exceptions (which would show the Write-Error output only). Nice!

I learnt from here that my Function was throwing out a lot of errors like these:

This is the result of a Write-Error $_.Exception block in my code right after an Invoke-WebRequest.

Googling on this brought me to this page on the various outbound limits. From this section (emphasis mine):

Azure uses source network address translation (SNAT) and Load Balancers (not exposed to customers) to communicate with public IP addresses. Each instance on Azure App service is initially given a pre-allocated number of 128 SNAT ports. The SNAT port limit affects opening connections to the same address and port combination. If your app creates connections to a mix of address and port combinations, you will not use up your SNAT ports. The SNAT ports are used up when you have repeated calls to the same address and port combination. Once a port has been released, the port is available for reuse as needed. The Azure Network load balancer reclaims SNAT port from closed connections only after waiting for 4 minutes.

When applications or functions rapidly open a new connection, they can quickly exhaust their pre-allocated quota of the 128 ports. They are then blocked until a new SNAT port becomes available, either through dynamically allocating additional SNAT ports, or through reuse of a reclaimed SNAT port. If your app runs out of SNAT ports, it will have intermittent outbound connectivity issues.

In my case since the outbound requests were all to the same end point I was hitting this limit. I could only make about 100 connections before exhausting ports.

There are some suggestions to avoid the problem, including connection pooling. It looks like PowerShell 6 and above don’t do that (based on this GitHub issue, specifically this comment). The only other alternative in my case seemed to be to add additional instances or change my code so it batches requests before sending them over. (I could do the latter but it’s a bit tricky).

For now I added additional instances to my Function but I don’t know if that’s a viable long term solution. A rule of thumb from that document seemed to be that for every 100 connections or so to the same end-point I could run into this issue. Another thing I learnt is on using the Diagnostics menu to identify SNAT port exhaustions. Useful stuff. It gives you two useful stats – pending SNAT connections and failed SNAT connections. The pending SNAT warning is not too problematic – I’ve seen those but my HTTP requests succeeded – so the failed SNAT is the one to keep an eye out for. Once I increased the number of instances these went away. It even shows you the SNAT usage per instance:

As you can see each instance gets about 128 ports. One of them for some reason has 256 and another has 192… dunno why! 🤷‍♂️

Additionally, and this is just as an FYI to myself and not related to the current issue, there is a larger limit too of all connections across the VM (the App Service plan) varies on the size:

Update: Came across a good blog post explaining SNAT port exhaustion and App Services.