Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

Reading & Listening Updates

  • Started reading (on my Kindle) “Jonathan Strange & Mr. Norell” by Susanna Clarke. Wow, never imagined I’d read a book like this and love it. I am hooked to the olden English used by the author and the way she writes – the long descriptions, details, foot notes, etc. Reads like a children’s novel from a long ago age. I am about 25% done. Looking forward to finishing it.
  • Going to start listening to in Audible “The Dead Mountaineer’s Inn: One More Last Rite for the Detective Genre” by the Strugstsky brothers. I listened to the introduction. Sounds like an interesting book fingers crossed.
  • Listened to this episode of the Vector podcast. It’s an interview by Rene Ritchie of Ashraf Eassa who is an expert in CPUs, and is a good listen.
  • Speaking of podcasts, I came across (and loved the first episode of) a new podcast from Microsoft. It’s called Behind the Tech with Kevin Scott. Going to listen to the second episode next.

Life and all that…

I went traveling to a different country recently. The day after reaching there I realized that I have loose motions. Not sure what I ate that disagreed with me as I was fine all night long and only had this sudden urge to rush to the toilet an hour after I woke up (during which I ate nothing). Coincidentally, on the day that I was leaving for this place I had also packed my Imodium tablets (they put an end to loose motions, they are amazing!). I hadn’t packed them for the trip but when I woke up that day I had a headache and so decided to pack some Panadol just in case. Looking for extra supplies in my medicine drawer I chanced upon a pack of Imodium which was expiring this month and so took it along just in case. And the very next day they turned out to be useful. “How lucky!” my mind thought.

But is that luck? Wouldn’t luck have been not having loose motions in the first place? I am “lucky” in that I generally don’t get loose motions when traveling (in fact the last time I remember getting loose motions on a trip was maybe 4-5 years ago) so considering I had to get loose motions, yes I was lucky that events conspired such that I took the tablets along; but I would have been luckier if I just didn’t get loose motions at all.

I suppose I should be thanking God for the stroke of luck. But then again couldn’t God just have prevented me from getting loose motions (either through a more resilient stomach or just pointing me away from whatever food triggered the loose motions). If God is all seeing and thus all this is predestined then I was meant to get loose motions and also meant to carry tablets along – so this is all just pointless no?

I don’t get a God who is all seeing and does pointless things just for self-amusement or something. A better explanation is that there is no predestination and God isn’t all seeing. Things just happen but God is someone who is more aware of the probabilities and multiple futures. I too as a Human can calculate these but I am nothing compared to God who is able to calculate things infinitely better and with more nuance perhaps. He can then nudge me along such that I am better prepared for things. This is the God in M. Night Shyamalan’s “Signs”. The God who arranged for things such that the family was able to kill the aliens and the preacher’s brother nearly escaped having his kiss messed up forever by the girl puking into his mouth.

So what makes things happen? I dunno. Random chance I guess. Is God able to influence things? I dunno. If he can influence things directly – like send a thought to my head to take medicines or avoid a certain food – then it doesn’t make much sense as he can avoid a lot of the drama of life by just making sure certain things don’t happen. So direct influencing doesn’t make sense to me, it has to be indirect. He can arrange for things to sway the probabilities this way or the other perhaps, but what eventually happens is not His direct doing. Thus He can arrange for me to wake up with a headache (make sure I have a bad sleep and I will wake up thus) then arrange for things to be such that I open my medicine cabinet to look for headache medicines and see the stomach upset medicines; but whether I actually take the medicines along or not depends on me. That bit is my decision, he can only arrange things so I have a decision to make.

Thus God is not responsible for my actions or those of others. But He can influence actions – both mine and others. Because I wake up with a headache chances are I might go pick up a fight with someone or be unnecessarily rude, but that choice still rests with me and what I do is my responsibility. The cards are stacked against me but it is my free will to act out. A lot of times I will fall into His trap and act poorly, a few times I will act better. This too has His influence written all over it because the person I am today is a result of my past and the events there, and if He so influenced my past events to be one which has left me full of negative thoughts and a depressed nature chances are I will react very poorly to the events in my life (making my future prospects poor in turn). He has slowly nudged the probabilities to be better able to nudge me the way He wants. I was probably a clean slate when born (sort of, coz my environment and parents etc too matter of course) but over time He is able to influence my actions more.

One could split God into the God and Devil I guess. One tries to do good by you, the other bad. Or there could even be a host of Beings I guess. I mean who knows. Even if there’s just one or a whole lot, the question is why should any one or more of these care for me. Why should they try to do good by me (take medicines for instance) or bad (eat food that causes a stomach upset). What’s their stake in it? Is it because I pray and so God wants to do good by me and the Devil wants to hurt me; or am I just a pawn in a game between them where things have to happen for the sake of the drama? I dunno. I don’t like to think in terms of black and white so this concept of God and Devil sounds rubbish to me. I prefer to think of things this way: one, there’s Life which is the random happening of events; two, there’s Beings that can influence things one way or the other because they can see more of the big picture in space and time; and three:, there’s us Humans (and other Animals?) who can make individual choices which may not necessarily be easy to make because the Beings I mentioned could influence our decisions but we nevertheless have free will and so the decisions are ours in the end and these in turn feed into Life and affect others and have an interplay with everything. Life + Beings is what one would refer to as the Tao I guess.

Why do these Beings do what they do? Are they just impartial beings or themselves influenced by other things? Maybe they are influenced by Humans too via their deeds and prayers (or lack thereof)? Maybe Beings influence each other, maybe the random events of Life influence these Beings too? My “guardian angel” Being (for lack of a better term) had some interest in ensuring I don’t have a shitty trip (literally haha!) so ensured I took tablets along. Other Beings have negative interests towards me for whatever reasons so They ensure a lot of other things don’t go well with me. Or maybe it’s the same Being who both didn’t want to spoil my trip but otherwise has it out for me in certain matters so spoils them and helps out in other matters however He can. Who knows.

Part of me knows that this is me anthropomorphizing things. Things happened, I was lucky to take tablets along, now I am trying to find a reason of explain things. Big deal! But I don’t think so. I don’t feel life is entirely random. A lot of times things seem to have a pattern. It’s like a fractal. Seems very complex and varied but there’s a kernel of a pattern which influences the overall structure. I find life to be like that. It’s a mix of random (the capital L Life) and some non-random ordering (God; Beings) working together. A sensible philosophy for mental peace would be to accept things and find understanding but it doesn’t work that way. It is irritating when things don’t work out even though you may put in a lot of effort, or things always seem to go a certain way as if there’s some bad luck or jinxing involved. What does one do here? Keep trying? Work harder? Pray!? :) I don’t know. I don’t think praying or pointlessly trying is an answer. But I do feel that one must try as much as one can, without getting frustrated. Try because that’s in ones nature, but couple it with an understanding perhaps that there is a non-random element at work too which for whatever reason nudges things around so you may not always get what you want (and will sometime get things when you least expect it). If there are some non-random sequence of events which work to ensure that you are never stuck without loose motion pills in case you have to get loose motions during a trip, then there are also non-random sequence of events which will work to ensure the probabilities are always stacked against you in certain aspects of life and how much ever you try things will always seem to work against you. There’s nothing you can do in the latter, but that doesn’t mean you give up. The same way these Beings can’t make you win by taking tablets, They can’t make you fail either. The only time you really fail is when you fall down and stay down (I am paraphrasing this from something I read). As long as you get up, even though your legs might be broken, you haven’t decided to fail. At the end you might have failed because the cards were stacked against you but you (the capital H Human in this game of life) hasn’t given up.

Before wrapping up, something I want to add as a reminder to my future self reading this. If things have conspired to make it a bad day and are nudging you to make bad decisions, remember the final decision is still yours. It is hard to resist because of all the environmental factors, but remember you have a choice and try to exert it.

Ps. Typing this post on my iPhone from the beautiful (but hot, wrong time to visit!) Armenia. Maybe all these thoughts are thanks to the monastery visits or the long hours spent half asleep half dreaming in the car rides to these monasteries. I didn’t know Armenia had such a rich heritage until I visited the place. I knew of Armenia somewhere in the back of my mind but didn’t realize how old and historically rich it was.

[Aside] Under the Hood with DAGs

Watching this Ignite 2015 video: Under the Hood with DAGs, by Tim McMichael.

Adding some links here to supplement the video:

  • Tuning Failover Cluster Network thresholds – useful when you have stretched DAGs
  • The mystery of the 9223372036854775766 copy queue… – never had this one but good info.
    • Basically, the cluster registry keeps track of the last log number per database and also the timestamp. When a node wants to see its copy queue length (i.e. how behind it is in terms of processing the logs) it can compare this log number with the log number it has actually processed. Sometimes, however, some node might be having issue updating the cluster registry or reading the cluster registry and so they fall behind in terms of receiving updates. In such cases the last log number will match what they have processed, but it is actually outdated info and so if the Exchange Replication service on the server hosting the passive copy notices that the timestamp is 12 minutes ago it puts its database copy into self-protection mode. This is done by putting the copy queue length (a.k.a. CQL) manually as 9 quintillion (the maximum a 64-bit integer can be). No one can actually have such large a copy queue length so it’s as good number to choose.
    • The video suggests rebooting each node until you find one which might be holding updates. But the link above suggests a different method.

DAC

Came across some Datacenter Activation Coordination (DAC) from Tim’s blog: part 1, followed by a series of posts you can see at the end of part 1.

DAC mode works by using a bit stored in memory by Active Manager called the Datacenter Activation Coordination Protocol (DACP). DACP is simply a bit in memory set to either a 1 or a 0. A value of 1 means Active Manager can issue mount requests, and a value of 0 means it cannot.

The starting bit is always 0, and because the bit is held in memory, any time the Microsoft Exchange Replication service (MSExchangeRepl.exe) is stopped and restarted, the bit reverts to 0. In order to change its DACP bit to 1 and be able to mount databases, a starting DAG member needs to either:

  • Be able to communicate with any other DAG member that has a DACP bit set to 1; or
  • Be able to communicate with all DAG members that are listed on the StartedMailboxServers list.

The bit I italicized is important. If you read his blog post you’ll see why. If DAC is activated and you are starting up a previously shutdown DAG, even though the DAG might have quorum it will not start up if some members are still offline. (I had missed that when reading about DAC earlier). To summarize it succinctly from part 2 of his series:

Remember, with DAC mode enabled, different rules apply for mounting databases on startup. The starting DAG member must be able to participate in a cluster that has quorum, and it must be able to communicate with another DAG member that has a DACP value of 1 or be able to communicate with all DAG members listed on the StartedMailboxServers list.

Here’s highlights from some of the interesting posts in Tim’s series:

  • Part 4 has info on the steps to do a datacenter switchover and the cmdlets available when DAC is enabled. Essentially: you 1) Stop-DatabaseAvailabilityGroup with –configurationOnly:$TRUE switch for the site that is down – this marks the servers in the site that is down as down, 2) Stop-Service CLUSSVC on the nodes in the site that is up, and finally 3) Restore-DatabaseAvailabilityGroup specifying the site that is up. This Microsoft doc on datacenter switchovers is worth reading side-by-side. It contains info on both DAC and non-DAC scenarios so watch out for that.
  • Part 5 has info on how to use the Start-DatabaseAvailabilityGroup cmdlet to set the DACP bit as 1 on a specified server thus bringing up the DAG by forcing a consensus.
  • Part 6 is an interesting story. A nice edge case of DAC being enabled and graceful shutdown.
  • Part 8 is another interesting story on what happens due to a typo in a cmdlet.

Very briefly, the DAC cmdlets:

  • Stop-DatabaseAvailabilityGroup – mark a specified server, or all server in a specified AD site, as down. Use the -ConfigurationOnly switch to mark the server as down in AD only but not actually do anything on the server(s). Need to use this switch if the servers are already offline but AD is up and accessible in that site. This cmdlet also forces a sync of AD across sites so the information is propagated.
  • Start-DatabaseAvailabilityGroup – same as above, but mark as up. Can use the -ConfigurationOnly switch to not really do anything but only mark in AD.
  • Restore-DatabaseAvailabilityGroup – it evicts any stopped servers, it can configure the DAG to use an alternate witness server, and it brings up the DAG after doing this. This cmdlet can only be used against a DAG with DAC enabled.

Dynamic Quorum

Came across dynamic quorum from the videos (wasn’t previously aware of it). Am being lazy and will put in some screenshots from the video:

The highlighted part is the key thing.

Remember that quorum is defined as “(the number of votes)/2 + 1“. Each node (or witness) typically has a single vote, and (number of votes)/2 is rounded down (i.e. 7/2 = 3.5, rounded down to 3).

With dynamic quorum once a node (or set of nodes) fail, and if the remaining set of nodes form a quorum (note – they have to form quorum), then the required quorum of the cluster is adjusted to reflect the remaining number of nodes.

Take a look at the scenario below:

We have two data centers. 6 nodes + a witness, so initially the quorum was 7/2 + 1 = 4.

The link between the two data centers goes down. Data center B has 3 nodes, which is below the quorum of 4 so all 3 nodes shutdown. Data center A has 3 nodes + witness, thus meeting the quorum and it stays up.

At this point if any further node in data center A goes down, they will fall below the quorum and the cluster will shutdown. To avoid such a situation is where dynamic quorum comes in. With dynamic quorum (introduced since Server 2012) when the nodes in data center A form quorum, the new quorum requirements is 4/2 + 1 = 3.

If a node goes down in data center A, leaving 2 nodes + a witness, since they meet the new quorum of 3 the cluster stays up. The quorum then gets revised to be 2/2 + 1 = 2. If yet another node goes down, the remaining node + witness still meets the new quorum of 2 and so the cluster continues to stay up.

Another slide:

Two data centers, 2 nodes + 1 node, no witness; the quorum is therefore 3/2 + 1 = 2.

One of the nodes in data center A goes down. Since the number of remaining nodes meets quorum, the cluster can stay up. But since there is no fail share witness each node cannot be given an equal vote (I wasn’t aware of this). Thus the cluster service picks up one of the nodes (the one with the lowest node ID) and gives it a vote of 0. The node in data center A has a vote of 0. The new quorum is thus 1/2 + 1 = 1.

If the link between the two data centers goes down, the node in data center B stays up even though the node in data center A too could have formed quorum! Nothing wrong with it, just an edge case to keep in mind as chances are you probably wanted data center A to remain up as that’s why you provisioned two nodes there in the first place.

Now for a variant in which there is a witness:

So two data centers, 2 nodes + 2 nodes, 1 witness in data center A; the quorum is therefore 5/2 + 1 = 3.

As before, one of the nodes in data center A goes down. Since there are 3 nodes + witness remaining, they meet quorum and the cluster continues. The new quorum is 4/2 + 1 = 3. Again, the data center link goes down. Everything goes down! :) Why? Coz no one has a clear majority. Each side has 2 votes, not the 3 required.

Interestingly I have this setup at work. So a critical thing to keep in mind is that if I were to update & reboot the witness or one of the nodes in data center A (my preferred data center), and the WAN link were to go down – I could lose the cluster! No such problems if I update & reboot a node in data center B and the link goes down, as data center A has the majority. Funny, it’s like you must keep the witness in the less preferred data center.

Windows Server 2012R2 improves upon dynamic quorum by adding dynamic witness.

So if the number of votes is odd, the witness vote is removed. And if the witness is offline or failed, then too it is removed (that includes reboots too, right?). 

Now things get tricky.

Going back to the previous example: so two data centers, 2 nodes + 2 nodes, 1 witness in data center A; the quorum is therefore 5/2 + 1 = 3.

As before a node data center A goes down (the picture is a bit incorrect as I skipped some intermediate slides), the remaining nodes have quorum so the cluster stays put. The new quorum is 4/2 + 1 = 3. But since the number of nodes is now ODD, cluster service removes the witness from the vote calculations. So the new quorum turns out to be 3/2 + 1 = 2. At this point if the link goes down, the nodes in data center B have quorum and so they form a cluster while the remaining node in data center A is shut down. So unlike the Server 2012 case, which had no dynamic witness, the whole cluster does not go down!

Now, going back to the case where one of the nodes (not witness) had its vote removed as there were only one node in each data center, I mentioned that the node with the lowest ID gets removed. The next two slides talk about that, including how to select a node in a cluster that we’d preferentially like to remove the vote of in such situations.

At this point I’d also like to link to this Microsoft doc on cluster quorum. Am going to quote some parts from there as they explain well and I’d like to keep it here as as reference to myself.

How cluster quorum works

When nodes fail, or when some subset of nodes loses contact with another subset, surviving nodes need to verify that they constitute the majority of the cluster to remain online. If they can’t verify that, they’ll go offline.

But the concept of majority only works cleanly when the total number of nodes in the cluster is odd (for example, three nodes in a five node cluster). So, what about clusters with an even number of nodes (say, a four node cluster)?

There are two ways the cluster can make the total number of votes odd:

  1. First, it can go up one by adding a witness with an extra vote. This requires user set-up.
  2. Or, it can go down one by zeroing one unlucky node’s vote (happens automatically as needed).

I didn’t know about point 2 until watching this video.

Worth bearing in mind that this also applies in the case of the witness being lost. So any time your witness is offline the cluster service automatically zeroes the vote of one of the nodes. If you have 2 nodes in each data center + a witness in one data center, and you reboot the witness – that is fine. One of the nodes will have its vote zeroed out, but there’s no impact and when the witness returns the zeroed out node gets its vote back. But if during the time your witness is rebooting you also have a network outage between the two data centers, then the data center with majority nodes (i.e. not the data center containing the node whose vote was zeroed) wins and the cluster fails over there.

Some more:

Dynamic witness

Dynamic witness toggles the vote of the witness to make sure that the total number of votes is odd. If there are an odd number of votes, the witness doesn’t have a vote. If there is an even number of votes, the witness has a vote. Dynamic witness significantly reduces the risk that the cluster will go down because of witness failure. The cluster decides whether to use the witness vote based on the number of voting nodes that are available in the cluster.

Dynamic quorum works with Dynamic witness in the way described below.

Dynamic quorum behavior

  • If you have an even number of nodes and no witness, one node gets its vote zeroed. For example, only three of the four nodes get votes, so the total number of votes is three, and two survivors with votes are considered a majority.
  • If you have an odd number of nodes and no witness, they all get votes.
  • If you have an even number of nodes plus witness, the witness votes, so the total is odd.
  • If you have an odd number of nodes plus witness, the witness doesn’t vote.

Am pretty sure am going to forget all this a few days from today so I’ll re-link to the docs again as it goes into more detail and has examples etc.

[Aside] Stretchly – Break time reminder

Via the always resourceful How-To Geek. Came across Stretchly and Big Stretch Reminder. Two software that remind you to take breaks and micro-breaks. I have previously used WorkRave, wanted to try something different now. This time I’ll try Stretchly as I like its UI and is open-source.

[Aside] Various Exchange 2013 links

I am reading up on Exchange 2013 nowadays (yes, I know, bit late in the day to be doing that considering it is going out of support :) and these are some links I want to put here as a bookmark to myself. Some excellent blog posts and videos that detail the changes in Exchange 2013.

(By way of background: I am not an Exchange admin. I am Exchange 2010 certified as I have a huge interest in Exchange and as part of preparing for the certification I had attended a course and setup a lab on my laptop and even begun this blog to start posting about my adventures with it. I never got to work with Exchange 2010 at work – except as a helpdesk administrator one could say – but I am familiar with the concepts even though I have forgotten more than I remember. I have dabbled with Exchange 2000 before that. Going through these links and videos is like a trip down memory line – seeing concepts that I was once familiar with but have since changed for the better. Hopefully this time around I get to do more Exchange 2013 work! Fingers crossed).

If you don’t like reading, start with this video.

Alternatively, start with these links but I’d strongly recommend watching the above video once you finish reading.

Preferred Architecture

From the preferred architecture link I like to highlight this point about DAG design as I wasn’t aware of it (PA == Preferred Architecture; this is also discussed in the video):

Data resiliency is achieved by deploying multiple database copies. In the PA, database copies are distributed across the site resilient datacenter pair, thereby ensuring that mailbox data is protected from software, hardware and even datacenter failures.

Each database has four copies, with two copies in each datacenter, which means at a minimum, the PA requires four servers. Out of these four copies, three of them are configured as highly available. The fourth copy (the copy with the highest Activation Preference) is configured as a lagged database copy. Due to the server design, each copy of a database is isolated from its other copies, thereby reducing failure domains and increasing the overall availability of the solution as discussed in DAG: Beyond the “A”.

The purpose of the lagged database copy is to provide a recovery mechanism for the rare event of system-wide, catastrophic logical corruption. It is not intended for individual mailbox recovery or mailbox item recovery.

The lagged database copy is configured with a seven day ReplayLagTime. In addition, the Replay Lag Manager is also enabled to provide dynamic log file play down for lagged copies. This feature ensures that the lagged database copy can be automatically played down and made highly available in the following scenarios:

  • When a low disk space threshold is reached
  • When the lagged copy has physical corruption and needs to be page patched
  • When there are fewer than three available healthy copies (active or passive) for more than 24 hours

By using the lagged database copy in this manner, it is important to understand that the lagged database copy is not a guaranteed point-in-time backup. The lagged database copy will have an availability threshold, typically around 90%, due to periods where the disk containing a lagged copy is lost due to disk failure, the lagged copy becoming an HA copy (due to automatic play down), as well as, the periods where the lagged database copy is re-building the replay queue.

With all of these technologies in play, traditional backups are unnecessary; as a result, the PA leverages Exchange Native Data Protection.

The last line made me smile. Never thought I’d read someplace that backups for Exchange are unnecessary! :) If you have a lagged copy database, then you can enable circular logging on the database (this only affects the non-lagged copies) and skip taking backups – or at least not worry about the database dismounting because your backups are failing and logs are filling up disk space!

So what’s a lagged database copy? Basically it’s a copy of the database (in a DAG) that lags behind other members by a specified duration (maximum is 14 days). So if the other servers in your DAG have some issue, rather than restore the database from backup you can simply “play down” the lagged database copy (i.e. tell that copy to process all the transaction logs it already has an thus become up-to-date) and activate it. Neat, huh. I want to delve a bit more into this, so check out this “Lagged copy enhancements” section from the Exchange 2013 HA improvements page.

First there’s Safety Net (it’s not related to lagged copies, but it plays along well with it in a cool way so worth pointing out):

Safety Net is a feature of transport that replaces the Exchange 2010 feature known as transport dumpster. Safety Net is similar to transport dumpster, in that it’s a delivery queue that’s associated with the Transport service on a Mailbox server. This queue stores copies of messages that were successfully delivered to the active mailbox database on the Mailbox server. Each active mailbox database on the Mailbox server has its own queue that stores copies of the delivered messages. You can specify how long Safety Net stores copies of the successfully delivered messages before they expire and are automatically deleted.

Ok – so each mailbox server has a queue for each of its active database (remember lagged copies are active too, just that they have a higher number and hence not preferred). This queue contains messages that were delivered. Even after a message is delivered to a user, Safety Net can keep it around. You get to specify how long a message is kept for. Cool! Next up is this cool integration:

With the introduction of Safety Net, activating a lagged database copy becomes significantly easier. For example, consider a lagged copy that has a 2-day replay lag. In that case, you would configure Safety Net for a period of 2 days. If you encounter a situation in which you need to use your lagged copy, you can suspend replication to it, and copy it twice (to preserve the lagged nature of the database and to create an extra copy in case you need it). Then, take a copy and discard all the log files, except for those in the required range. Mount the copy, which triggers an automatic request to Safety Net to redeliver the last two days of mail. With Safety Net, you don’t need to hunt for where the point of corruption was introduced. You get the last two days mail, minus the data ordinarily lost on a lossy failover.

Whoa! So when a lagged copy is mounted, it asks Safety Net to redeliver all messages in the specified period – so as long as your Safety Net and lagged database copy have the same period, if you mount the lagged copy from the specified period ago, Safety Net will deliver all the messages since then. (It’s cool, but yeah I can imagine users complaining about a whole bunch of unread messages now, and missing Sent Items etc. – but it’s cool, I like it for the geek factor). :)

To re-emphasize something that was mentioned earlier:

Lagged copies can now care for themselves by invoking automatic log replay to play down the log files in certain scenarios:

  • When a low disk space threshold is reached
  • When the lagged copy has physical corruption and needs to be page patched
  • When there are fewer than three available healthy copies (active or passive only; lagged database copies are not counted) for more than 24 hours

Lagged copy play down behavior is disabled by default, and can be enabled by running the following command.

After being enabled, play down occurs when there are fewer than three copies. You can change the default value of 3, by modifying the following DWORD registry value.

HKLM\Software\Microsoft\ExchangeServer\v15\Replay\Parameters\ReplayLagManagerNumAvailableCopies

To enable play down for low disk space thresholds, you must configure the following registry entry.

HKLM\Software\Microsoft\ExchangeServer\v15\Replay\Parameters\ReplayLagLowSpacePlaydownThresholdInMB

After configuring either of these registry settings, restart the Microsoft Exchange DAG Management service for the changes to take effect.

As an example, consider an environment where a given database has 4 copies (3 highly available copies and 1 lagged copy), and the default setting is used for ReplayLagManagerNumAvailableCopies. If a non-lagged copy is out-of-service for any reason (for example, it is suspended, etc.) then the lagged copy will automatically play down its log files in 24 hours.

For future reference this doc has steps on how to mount a lagged database copy – i.e. if you are not doing the automatic play down behavior. You can manually play down via the Move-ActiveMailboxDatabase cmdlet with the -SkipLagChecks switch.

However, it is recommended you first suspend the copy (i.e. make it not “active”) and make a copy of the database and logs just in case.

Optionally, if you want to recover to a specific point in time you’d 1) suspend the database, 2) make a copy just in case, 3) move elsewhere all log files after the time you want to recover, 4) delete the checkpoint file, 5) run eseutil to recover the database – this is what replays the remaining logs and brings the database up to the point in time you want, and 6) move the database elsewhere to use as a recovery database for a restore. After this you move back the logs file previously moved away, and resume the database copy. This blog post has a bit more details but it is more or less same as the Microsoft doc. Note: I’ve never ever done this, so all this more of info for future me. :)

Lastly, that doc also has info on how activate a logged copy using Safety Net. Step 4 of the instructions made no sense to me.

Moving on … (but pointing to this HA TechNet link again coz it has a lot of other info that I skipped here).

Outlook Anywhere & OWA behind a WAP server

Some links around publishing Exchange namespaces such as OWA and Outlook Anywhere externally via a WAP server:

The easiest thing to do is pass-through everything via the WAP to the internal URL. But if you want, you can setup OWA authentication via ADFS claims. A step-by-step official guide is here, but the two links above cover the same stuff.

Healthcheck.htm

Exchange 2013 has a new monitoring architecture. When monitoring via a load balancer one can use a “healthcheck.htm” URL to test the health of each virtual directory (corresponding to each of the user consumed services). This URL is per virtual directoy, here’s an example from Citrix on how to add monitors for each service in NetScaler:

If the service is up the URL returns an HTTP 200 OK.

Virtual Directory cmdlets

Speaking of virtual directories, if any of the PowerShell Get- cmdlets for virtual directories are slow this blog post tells you why and what to do about it. These are the cmdlets, and the workaround is to add a switch -ADPropertiesOnly (this makes the cmdlet query AD Config partition for the same info rather than query each server’s IIS Metabase, which is slower):

  • Get-WebServicesVirtualDirectory
  • Get-OwaVirtualDirectory
  • Get-ActiveSyncVirtualDirectory
  • Get-AutodiscoverVirtualDirectory
  • Get-EcpVirtualDirectory
  • Get-PowerShellVirtualDirectory
  • Get-OABvirtualDirectory
  • Get-OutlookAnywhere

Update: Thought I’d add more videos and links to this post than make separate posts.

Transport Architecture

Check out this talk: https://channel9.msdn.com/events/TechEd/2013/OUC-B319. Slides available online here. I wanted to put a screenshot of the transport components as a quick reference to myself in this post:

So the CAS has stateless SMTP service. Affectionately called FET, or Front-End Transport.

The MBX has a stateful and stateless SMTP service. Called Transport and Mailbox Transport respectively. (Transport replaces the Hub Transport role of Exchange 2010).

There’s no longer a Hub Transport role. Previously the Hub Transport role on one server could directly talk to the store of another server – thus there were no clear layers. Everything was messy and tied up using RPC. Now there are clear layers as below and communication between servers happen at the protocol layer. Within a server communication goes up & down the layer; across servers it is using protocols like SMTP, EWS, and MRS proxy. Everything is clean.

Some slides on all three components:

Outbound email from the transport component on a MBX server can go out directly to an external SMTP server, or it can be delivered to the FET on any CAS server in the same site. This delivery happens on port 717 and needs to be specifically enabled.

Transport component listens on port 25 if MBX and CAS are on separate servers. Else it listens on port 2525 as CAS is already listening on 25. These ports are for accepting messages from the FET. For accepting messages from the Mailbox Transport component, it listens on port 465.

Remember that Transport is stateful.

Destination can be a CAS server or another transport component (on another MBX server). The Transport component is what does the lookup of the mailbox database.

Last component: Mailbox Transport. This is the component that actually talks to the next layer in the mailbox server. This talks MAPI and receives emails from the Transport component. This is also the component that does the message conversion (TNF to MIME and vice versa). No extensibility at this component as all that is at the Transport component. Once a message reaches Mailbox Transport there’s no changes happening to it!

Reading Updates

This is going to be a short one, but I wanted to put it down anyways. :)

Uncommon Types (audiobook)

Written and narrated by Tom Hanks, I had high hopes for this one. And it started off well too. The first story was amazing. A bit cliched in certain parts, but good nevertheless. The second story started off well as a Christmas Eve family story but ended up being about war reminiscences. That’s fine, can look past. I forget what the third and fourth stories were about – I know the third was about an actor on a press junket world tour, but both stories are easily forgettable. I think I began listening to the fifth story and left it … There was no investment from my side in any of the stories. It just felt pointless continuing with them.

The Great Train Robbery (audiobook)

Written by Michael Crichton, narrated by Michael Kitchen (who plays DCS Foyle in “Foyle’s War”, a must watch murder-mysteries TV show set during World War II). I listened to the first two chapters but had to leave it as I didn’t like the narration, and the content seemed too “heavy”. I think I was expecting a story, but this book was more non-fiction. And while the narration was good I didn’t like it for the fact that the author’s voice was too intense. There was a lot of drama and emphasis in the words. Difficult to explain it, but that’s what I was referring to in one of my earlier posts that sometimes I prefer a narrator who just reads out the story with minimal emoting letting my brain do the play-acting.

The Dark Tower: The Gunslinger (book + audiobook)

Dunno if I mentioned this before but I have been reading this for a while. Mostly the physical book but I bought the audiobook too when I feel tired of “reading”. Maybe it’s my age (hah!) or the times (not used to reading) I get tired fast if I read for a while, so it is easier use an audiobook as a crutch for when I need a helping hand. I’ve read the majority of the book, but I also re-read the first quarter of the book by listening to the audiobook version; and occasionally I have re-read a chapter by listening to the audiobook or skipped a chapter or two entirely in the book and listened to it instead. The audiobook is narrated by George Guidall, who is amazing and I have mentioned in my earlier posts.

Update 19 July: I stopped reading this book today. Pity coz I was nearly done and was beginning to think I might not mind sci-fi and fantasy after all. But the book was a drag. Too much thinking. Every scene, every line had so much undertones and meaning to it. No one just spoke or did something – there was always an inflection to it. A note in the voice or a thought behind the action. Goodness! Plus I was beginning to lose interest in what the whole thing was about. I read till the section on the slow mutants and Roland’s coming of age story and left it. I guess I had different expectations from this book. It wasn’t as verbose as Stephen King’s later works. Terse statements. Too much drama. It was just too much. I listened to the audiobook for the last few chapters hoping that would be better – but nope, same thing. Eventually I went to Wikipedia to see if there’s any point to the story – nope! I guess a few books later it gets better but I don’t care nor do I have the patience. Sci-fi and fantasy aren’t for me, I should just get used to it!

An Accidental Death (audiobook)

Started this one yesterday. Written by Peter Grainger, narrated by Gildart Jackson (listening to him for the first time, I like what I am hearing). So far so good, seems to be a slow police procedural and I am liking what I am hearing.

Update 17 July: Finished it. Good book! Loved it. The last chapter was a bit too much – guitar playing and all, but whatever to each his own. Was thinking of buying the next one in the series but some audible reviews put me off. I’ll wait before spending a credit on them.

[Aside] NameCheap CSR generator

I have previously mentioned the DigiCert CSR utility and how I generate CSRs via it and also export the private key. Today I came across a site from NameCheap that does CSR generation and also key conversion etc. Nice one! Also worth reading this article from them on extracting private keys.

For completeness sake here’s how to do the same via OpenSSL.

[Aside] Various DPM 2016 links

Reading up on (and trying to work with) DPM 2016 nowdays so here’s some links to myself before I close them from the browser:

Copy the path and save it in a notepad. It’ll look like the following. E:\ on DPM2016TP5-01.contoso.local C:\Program Files\Microsoft System Center 2016\DPM\DPM\Volumes\Replica\31d8e7d7-8aff-4d54-9a45-a2425986e24c\d6b82768-738a-4f4e-b878-bc34afe189ea\Full\E-Vol\

The first part of the copied string is the source. The second part, separated by a whitespace, is the destination. The destination contains the following information:

DPM Install Folder          C:\Program Files\[..]\DPM\Volumes\Replica\

Physical ReplicaID          31d8e7d7-8aff-4d54-9a45-a2425986e24c\

Datasource ID                   d6b82768-738a-4f4e-b878-bc34afe189ea\

Path                                        Full\E-Vol\

 

TIL: Network access: Restrict clients allowed to make remote calls to SAM

Today I learnt of this setting. I was seeing messages like the following on a couple of my servers and read the link:

1 remote calls to the SAM database have been denied in the past 900 seconds throttling window.
For more information please see http://go.microsoft.com/fwlink/?LinkId=787651.

This part gives you a gist of the matter:

The SAMRPC protocol makes it possible for a low privileged user to query a machine on a network for data. For example, a user can use SAMRPC to enumerate users, including privileged accounts such as local or domain administrators, or to enumerate groups and group memberships from the local SAM and Active Directory. This information can provide important context and serve as a starting point for an attacker to compromise a domain or networking environment.

To mitigate this risk, you can configure the Network access: Restrict clients allowed to make remote calls to SAM security policy setting to force the security accounts manager (SAM) to do an access check against remote calls. The access check allows or denies remote RPC connections to SAM and Active Directory for users and groups that you define.

By default, the Network access: Restrict clients allowed to make remote calls to SAM security policy setting is not defined. If you define it, you can edit the default Security Descriptor Definition Language (SDDL) string to explicitly allow or deny users and groups to make remote calls to the SAM. If the policy setting is left blank after the policy is defined, the policy is not enforced.

The default security descriptor on computers beginning with Windows 10 version 1607 and Windows Server 2016 allows only the local (built-in) Administrators group remote access to SAM on non-domain controllers, and allows Everyone access on domain controllers. You can edit the default security descriptor to allow or deny other users and groups, including the built-in Administrators.

So it looks like in my case some remote computer was trying to access this server’s SAM database (this is a server 2016 BTW) and it wasn’t in the local admin group of this server.

PowerShell: listing snap-ins and their cmdlets

I know what to do when it comes to finding a list of available modules in PowerShell (Get-Module -ListAvailable) and a list of cmdlets in each module (Get-Command -Module <name>). But I don’t use snap-ins much and so the equivalent for that isn’t always available on my fingertips. So this blog post is a reminder to future myself:

To list all registered snap-ins:

And to find the cmdlets in a particular snap-in:

Of course replace -match with -eq if you know the snap-in’s exact name.

Update: I am not sure (coz I don’t recollect my steps) but maybe I need to do an Add-PSSnapin <name> first before Get-Command.

Internet not working in Chrome but works fine in IE

Today, Internet browsing via Chrome stopped working at my office. IE was not affected, only Chrome. The error was just that the site couldn’t be reached.

I fired up Chrome and went to “chrome://net-internals/“. In the page that opened I went to the “Proxy” in the left sidepane and saw that although the original proxy settings were “auto detect” the effective proxy settings were “direct”. That didn’t make sense – Chrome was set the proxy settings of IE, but IE was working fine and detecting a proxy but Chrome wasn’t. A quick Google search showed me that if Chrome is having trouble finding a proxy, it resorts to a direct Internet connection. Seems to be by design. So why was Chrome having trouble finding a proxy? IE was set with a WPAD file location so I went to the “Events” in the side pane of “chrome://net-internals/” to see if it was having trouble finding the WPAD file. It didn’t, but there were errors like these:

The line referred to was the last line of the WPAD file so clearly it was reading it and there was something wrong with the syntax of the file. I opened up the file in Notepad++, set the language to JavaScript (so I get syntax highlighting and braces matching etc), and went through the various script blocks in the file. Sure enough one section had a missing ending brace “}” and that was tripping up Chrome. Not sure why IE was able to move past this error, but there you go. I added the missing brace and Chrome began working. :)

“The Outsider”

Just wrapped up an 18 and half hour listen of Stephen King’s “The Outsider”. My longest audiobook probably.

“The Outsider” is good but not great. It has its moments though and the story gets better and picks up pace as we go along. It’s very “procedural” and you can think of it as a piece of tapestry slowly woven together by the various threads that is each chapter. That part is good. I don’t mind verbosity and lots of detail and meandering etc., and it’s good to see everything slowly come to place and fit together.

What didn’t make this great for me though is that it is less in the vein of his “Bill Hodges” trilogy, although the book is meant to be a spiritual successor to it I think in that it’s a murder mystery and has one of the characters – Holly Gibney – play a central role in this book. I loved the “Bill Hodges” trilogy. They had the right pace and mystery for me and while the third one was less mystery and more of Mr. King’s usual super natural stuff I didn’t mind it and it gelled along with the rest of the books. “The Outsider” continues this by taking up the supernatural a notch.

Holly Gibney was amazing. It was nice how towards the middle of the book he just introduces her into the story. Didn’t expect that but I wasn’t too surprised coz I think I read somewhere that she plays a part in this book. (I purchased the audiobook as it was narrated by Will Patton and seemed to be a murder mystery like the “Bill Hodges” trilogy). Holly took charge of things once she was introduced and slowly got everyone to see the big picture and “believe” in the Outsider. All that plot development was great. The few ending chapters were good too – not much action just a slow putting us Dear Readers back on the floor after taking us for this journey.

[Aside] Query remote RDP sessions and kill them

If you want to query the remote RDP sessions on a machine:

And to disconnect:

Azure and DAG IP – not pingable on all nodes

If you are running an Exchange DAG in Azure you won’t always be able to ping the DAG IP. In fact, IP less DAG seems to be the recommendation.

Crazy thing is I can’t find any install guide or advise for Exchange on Azure. There’s plenty of documentation on setting up an Exchange 2013 DAG witness in Azure for use with on-prem Exchange, but nothing on actually setting up a DAG in Azure (like how you’d find for SQL AlwaysOn in Azure, for example). This is a good article though for a non-techie introduction.

Another thing to bear in mind with Azure is that all communication between VMs – including those in the same subnet – happen via a gateway. If you check the  arp output of your Azure VMs you will see that all IPs are being intercepted by a gateway.  So if the gateway doesn’t know if the IP it won’t route it. This is why for SQL AlwaysOn you need to setup the availability group IPs on the Azure load balancer, thus making Azure aware of the IP.

In my case we have a two site setup in Azure and I noticed that I was able to ping the DAG IP when the PAM was in the DR site but not when it was in the main site. First I suspected some routing issue. Then I realized that hang on, the DAG IP was configured in each site but since it was manually assigned than being via a load balancer it was assigned to a single server NIC in each site. Thus, for instance, node01 in both sites had the DAG IP assigned to it while node02 in both sites did not. It so happened that when my PAM failed over to the DR site it failed to node01 which had the DAG IP assigned (of that site subnet) while when it failed over to the primary site it happened to choose node02 which did not have the DAG IP. Simple! Elementary but didn’t realize this was what was happening as I didn’t make the PAM role go to each node to see if it behaves differently.

Sometimes you got to let go of the big picture and see the small stuff. :)

[Aside] Easily switch between multiple audio outputs using SoundSwitch

Via the always helpful How-To Geek – if you have multiple audio output devices on Windows 10 (e.g. HDMI, regular headphones via the headphone jack, a couple of Bluetooth headphones) like I do, and always right click the volume icon and change default devices and wished there was an easier & faster way to do this, look no far! Check out SoundSwitch. :) Open Source and actively developed too.