Using regex with KQL in Data Collection Rules to filter Conditional Access policies

Note: This post has had numerous updates since my initial entry. Please read till the end.

There’s Data Collection Rules. That’s what you can use in Azure Monitor when ingesting logs, to transform it. Like, for instance, you have the sign-in logs coming from Entra ID into a Log Analytics workspace – now, that’s got a lot of information and maybe you want to save some $$$s by reducing some of the noise. In our firm, for example, there’s some 45 or so Conditional Access policies and every entry in the sign-in logs (there are separate log tables for interactive, non-interactive, managed identities, and so on) has a bunch of entries per policy usually stating they are notApplied.

In the screenshot above, which is a small set, all I really care about in the logs are the first two policies (green); the rest don’t really matter to me.

To remove the unwanted stuff you can use Data Collection Rules. They can transform the logs before being ingested by the Log Analytics workspace. You do this transformation via KQL; however, not all KQL operators are supported!

My colleague and I decided to tackle this. Our initial idea was to do something like this in the transformation section:

This goes through every policy in the ConditionalAccessPolicies column, and creates a new column (still called ConditionalAccessPolicies) that contains the policy name and its status… but only for the policies who have a status of success or failure. (KQL is so neat! That little snippet there has a lot of power).

Unfortunately, it doesn’t work.

After a lot of trial and error we decided to use regex, because the extract_all operator is supported and that can extract entries based on a pattern. This is what I came up with:

Here’s the regex by itself: ({[^}]+"result":"(?:success|failure)"[^}]+})

Before I explain the regex, let me digress a bit into what we want to do.

The ConditionalAccessPolicies is a bit of JSON that looks like this:

What we do first is convert this into a string. Flattening it basically.

What I then want to do with this snippet is go through it and find blocks where the result is a success or failure. Like these ones:

Which should be straight forward to do via regex.

To begin with, I need a pattern that matches any string starting with ‘{‘ all the way up to ‘}‘ and that contains ‘"result":"success"‘ or ‘"result":"failure"‘.

So that’s what this does: {[^}]+"result":"(?:success|failure)"[^}]+}

After the ‘{‘ match, however, we have this bit: [^}]+. That’s needed because we want to avoid the match being greedy. If I had done {.+"result":"(?:success|failure)".+} for instance, the .+ pattern being “one or more instances of any character”, the regex would have started from the first policy {"id":"2404b19c-c013-4a15-9d1e-1fe2f67ccbbd" and matched all the way up to the last policy "conditionsNotSatisfied":2}. Because that is a valid match for this pattern.

So that’s why instead of the .+ pattern I change it to [^}]+ which basically means any character that’s not “}” (that’s what [^}] means) and do one or more matches of that (hence the +). I use this same non-greedy way of matching both at the beginning and end: {[^}]+"result":"(?:success|failure)"[^}]+}

Next: the whole pattern ({[^}]+"result":"(?:success|failure)"[^}]+}) is within the brackets. That’s what puts into groups the bit that’s matched. This is standard regex, and something mentioned in the KQL specific regex document too. What extract_all() does is create an array out of all these groups that match.

Something new I learnt today is this bit: (?:success|failure). What I want to do is match for both success or failure. Typically I’d just put them in brackets like so: "result":"(success|failure)" but that didn’t work as extract_all() created separate groups. I fooled around a bit until I read the docs and realized the (?: operator does the non-capturing group.

So that’s basically it! This bit of regex ({[^}]+"result":"(?:success|failure)"[^}]+}) matches one or more occurances of {...}  that contain "result":"success" or "result":"failure". These are then put into an array, which we assign to ConditionalAccessPolicies and thus you have a limited set of the policies in ConditionalAccessPolicies instead of everything.

Neat! Sometimes you get a challenge that combines all of the niche things you are interested in – regex and KQL in this case! 😍 And a bit of Conditional Access policies thrown in on the side…

Update (an hour later): I blogged too quick! While testing further we realized that this changes the type of the ConditionalAccessPolicies column entries. While the type is still an array, the type of the individual entries in the array are now string. (The gettype() function can be used to find the type). For the default ConditionalAccessPolicies, each entry in the array is of type dictionary.

Update (many hours later!): I think I figured it out. The issue is that the output from extract_all() escapes all the quotes.  So {"id":"2404b19c-c013-4a15-9d1e-1fe2f67ccbbd", ...} becomes \"{\"id\":\"2404b19c-c013-4a15-9d1e-1fe2f67ccbbd\", ...}\". This results in the extracted JSON snippets not being converted into the dictionary type but remain as string instead, and while the portal UI helpfully expands it when viewing, commands like mv-expand don’t work with this.

When testing my code against some real data I was able to work it around like this:

But when doing the same with transformation rules, there’s an issue because there’s no replace_string() function there. Only replace(). So I changed it thus:

The replace() function uses regexs, that’s why the pattern is a bit different.

And that works! So I can now do the transforms like this:

Fingers crossed I don’t discover some other issue after this… 🤞🤞

Update (next day): A one-liner thanks to my colleague. I hate one-liners as they are not readable, but all the cool kids love them, so here goes:

😍

Update (next day, a few hours later): Aargh, you can’t make changes to existing columns! So change the above slightly.

I don’t like this coz it changes the column name, but part of me understands the reason.😐 This way it’s clear the column is a custom one. What’s irritating though is that the same doesn’t happen if you remove a column. I can easily remove an existing column, and no one using the workspace is the wiser.

Update (two days later!): My colleague suggested it would be good to have a count of the total number of policies too.

Update (three days later!): This is turning out to be the blog post I don’t stop updating. 😄

The ConditionalAccessPolicies column is not of type dynamic in non-interactive sign-in logs. It is of type string. (Why!? Don’t ask! We discovered this a while ago when combining both logs as part of troubleshooting and realized that many columns have the same name across both tables but different types. So we always do something like this:

Others too have observed this.

Anyways, because of this when I do ConditionalAccessPolicies_Count_CF = array_length(ConditionalAccessPolicies) in the code one above it doesn’t actually return anything for non-interactive logs. Instead we have to cast it to dynamic first, and then get the count. Like so:

Update (four days later!): Of course I have an update.

What about policies in report only mode? These have four possible states.

I found the values for these from this page for the appliedConditionalAccessPolicy resource.

Fun-fact – unknownFutureValue isn’t just a place holder for some unknown future value. In our tests, we found there’s actually a few policies that actually show this value. They were in report-only mode and here’s an example of one of them in the portal sign-in logs:

Oh, and the policy is actually in report only mode, but the picture above shows “disabled”. Go figure!

(The two instances we found were from B2B collaboration accounts, so maybe this state is related to such users).

So here’s a new version: