Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

[Aside] Memory Resource Management in ESXi

Came across this PDF from VMware while reading on memory management. It’s dated, but a good read. Below are some notes I took while reading it. Wanted to link to the PDF and also put these somewhere; hence this post.

Some terminology:

  • Host physical memory <–[mapped to]– Guest physical memory (continuous virtual address space presented by Hypervisor to Guest OS) <–[mapped to]– Guest virtual memory (continuous virtual address space presented by Guest OS to its applications).
    • Guest virtual -> Guest physical mapping is in Guest OS page tables
    • Guest physical -> Host physical mapping is in pmap data structure
      • There’s also a shadow page table that the Hypervisor maintains for Guest virtual -> Guest physical
      • A VM does Guest virtual -> Guest physical mapping via hardware Translation Lookup Buffers (TLBs). The hypervisor intercepts calls to these; and uses these to keep its shadow page tables up to date.
  • Guest physical memory -> Guest swap device (disk) == Guest level paging.
  • Guest physical memory -> Host swap device (disk) == Hypervisor swapping.

Some interesting bits on the process:

  • Applications use OS provided interfaces to allocate & de-allocate memory.
  • OSes have different implementations on how memory is classified as free or allocated. For example: two lists.
  • A VM has no pre-allocated physical memory.
  • Hypervisor maintains its own data structures for free and allocated memory for a VM.
  • Allocating memory for a VM is easy. When the VM Guest OS makes a request to a certain location, it will generate a page fault. The hypervisor can capture that and allocate memory.
  • De-allocation is tricky because there’s no way for the hypervisor to know the memory is not in use. These lists are internal to the OS. So there’s no straight-forward way to take back memory from a VM.
  • The host physical memory assigned to a VM doesn’t keep growing indefinitely though as the guest OS will free and allocate within the range assigned to it, so it will stick within what it has. And side by side the hypervisor tries to take back memory anyways.
    • Only when the VM tries to access memory that is not actually mapped to host physical memory does a page fault happen. The hypervisor will intercept that and allocate memory.
  • For de-allocation, the hypervisor adds the VM assigned memory to a free list. Actual data in the physical memory may not be modified. Only when that physical memory is subsequently allocated to some other VM does it get zeroed out.
  • Ballooning is one way of reclaiming memory from the VM. This is a driver loaded in the Guest OS.
    • Hypervisor tells ballooning driver how much memory it needs back.
    • Driver will pin those memory pages using Guest OS APIs (so the Guest OS thinks those pages are in use and should not assign to anyone else).
    • Driver will inform Hypervisor it has done this. And Hypervisor will remove the physical backing of those pages from physical memory and assign it to other VMs.
    • Basically the balloon driver inflates the VM’s memory usage, giving it the impression a lot of memory is in use. Hence the term “balloon”.
  • Another way is Hypervisor swapping. In this the Hypervisor swaps to physical disk some of the physical memory it has assigned to the VM. So what the VM thinks is physical memory is actually on disk. This is basically swapping – just that it’s done by Hypervisor, instead of Guest OS.
    • This is not at all preferred coz it’s obviously going to affect VM performance.
    • Moreover, the Guest OS too could swap the same memory pages to its disk if it is under memory pressure. Hence double paging.
  • Ballooning is slow. Hypervisor swapping is fast. Ballooning is preferred though; Hypervisor swapping is only used when under lots of pressure.
  • Host (Hypervisor) has 4 memory states (view this via esxtop, press m).
    • High == All Good
    • Soft == Start ballooning. (Starts before the soft state is actually reached).
    • Hard == Hypervisor swapping too.
    • Low == Hypervisor swapping + block VMs that use more memory than their target allocations.

 

TIL: vCenter inherited permissions are not cumulative

Say you are part of two groups. Group A has full rights on the vCenter. Group B has limited rights on a cluster.

You would imagine that since you are a member of Group A and that has full rights on vCenter itself, your rights on the cluster in question won’t be limited. But nope, you are wrong. Since you are a member of Group B and that has limited rights on the cluster, your rights too are restricted. Bummer if you are a member of multiple groups and some of these groups have limited rights on child objects! :o)

Workaround is to add yourself or Group A explicitly on that cluster, with full rights. Then the permissions become cumulative.

MCS choices (RAM cache & Disk cache)

Just a reminder to myself …

When creating a Desktop based Machine Catalog here are my choices:

If I choose Random then I get the option to allocate some of my RAM towards a cache, and also create a disk cache. RAM cache means all data is written to RAM first and then to disk as RAM fills up. And disk cache is like the Write Cache disk in PVS – you can specify a separate disk (maybe local to the host, or SSD storage) where data is written to.

Important to keep in mind here that the actual VM disk will not have any data written to it. All data writes either goes to the RAM cache or Disk cache. First RAM cache, then Disk cache. Both are optional; best to have both (or at least don’t do RAM cache only unless you have oodles or RAM!).

Read this post – it’s a good one. Also, check out the official post from Citrix introducing this feature in XenDesktop 7.9. MCS (Machine Creation Services) that makes use of RAM or Disk cache is known as MCSIO (Machine Creation Services Storage Optimization (beats me how that acronym works! :p)).

MCS VMs have two disks apart from the OS base disk – an identity disk and a delta disk. MCSIO VMs too have the identity disk and delta disk, but the delta disk is only used for maintenance tasks. Hence my comment above that when using either of these cache options, the size you allocate for these is your write cache/ delta disk. 

If I choose static I have three further options. 

If I go with static + save changes to a personal vDisk, I don’t get the option for cache disk etc. I can only choose my vDisk letter and size. 

 If I go with static + create a dedicated VM, again I don’t get any option for cache disk; I can only choose the copy mode (i.e. a linked clone or a full clone). 

If I go with static + discard all changes, then I get the option for cache disk and RAM allocation towards cache. Basically, static + discard is similar to random. You are not storing any changes, so it makes sense to use cache (RAM and/ or write cache). 

In the case of Server OS, I don’t have any choices (it’s always random) and I get the option for cache disk and RAM allocation.

MCSIO is only for non-persistent experiences. 

Notes to self on XenServer storage

Playing with XenServer in my testlab (basically as a VM in VMware Workstation hah!) I ran into trouble while creating a Machine Catalog via Citrix Desktop Studio. I forget the exact message but it was about lack of resources. I could see that in the create catalog process it was creating a snapshot and making a copy VM, powering it on and off successfully, and then it was failing. I kept an eye on my storage during this and saw that indeed it was exceeding the allocated space. I had thought it would do thin provisioning but in retrospect I realize XenServer never asked me about thick or thin when I added my iSCSI storage. Hmm.

Well turns out that for iSCSI XenServer has only thick provisioning. You get thin provisioning only if you are using the ext3 filesystem or NFS. Since iSCSI uses LVM, bummer! 

Here’s a forum post on how to identify if your SR is thick or not. 

Regarding thin provisioning – it is only for locally attached storage (which can use ext3) or NFS. Block attached storage is thick.

Before I realized all this I had spent some Googling on how to create a thin provisioned SR (Storage Repository). I felt that maybe it’s a GUI restriction and I can workaround by using the CLI. Turns out I was wrong. Here’s an article that explains SRs in XenServer anyway. It’s a good read. Here’s an article just on enabling thin provisioning for ext3 SRs via the CLI. 

While on the topic of storage, this is something I wanted to blog about earlier but never got around to. When using SMB/ CIFS shares, XenServer only supports NTLMv1. Here’s instructions on using NTLMv2

Also, smbclient is a good tool to test SMB connects from a XenServer. Example:

That seems to work, but I get a logon failure. This is because I didn’t put the username in quotes. 

That works!

I have no idea what the three commands below except that they are to do with mounting an SMB/ CIFS share on a XenServer permanently. I had noted these commands as part of my would be blog post, but it’s been a while now and I forget. Sometime when I get around to doing SMB3 or NTLMv2 with XenServer again I hope to refer to these again and better explain. I don’t want to spend too much time on XenServers now and get sidetracked …

After issue the above commands I think the shared folder is mounted only on one host in the pool. But right clicking on it and doing a repair will get it mounted on all hosts in the pool.

XenServer 7.0 and above support SMB for VM disk storage too. Prior versions support SMB only for ISO storage. 

Add-DnsServerZoneDelegation with multiple nameservers

Only reason I am creating this post is coz I Googled for the above and didn’t find any relevant hits

I know I can use the Add-DnsServerZoneDelegation cmdlet to create a new delegated zone (basically a sub-domain of a zone hosted by our DNS server, wherein some other DNS server hosts that sub-domain and we merely delegate any requests for that sub-domain to this other DNS server). But I wasn’t sure how I’d add multiple name servers. The switches give an option to give an array of IP addresses, but that’s just any array of IP addresses for a single name server. What I wanted was to have an array of name servers each with their own IP.

Anyways, turns out all I had to do was run the command for each name server. 

Above I create a delegation from my zone “parentzone.com” to 3 DNS servers “DNS0[1-3].somedomain” (also specified by their respective IPs) for the sub-domain “subzone.parentzone.com”.

NSX Firewall no working on Layer3; OpenBSD VMware Tools; IP Discovery, etc.

I have two security groups. Network 1 VMs (a group that contains my VMs in the 192.168.1.0/24) and Network 2 VMs (similar, for 192.168.2.0/24 network). 

Both are dynamic groups. I select members based on whether the VM name contains -n1 or -n2. (The whole exercise is just for fun/ getting to know this stuff). 

I have two firewall rules making use of these rules. Layer 2 and Layer 3. 

The Layer 2 rule works but the Layer 3 one does not! Weird. 

I decided to troubleshoot this via the command line. Figured it would be a good opportunity.

To troubleshoot I have to check the rules on the hosts (because remember, that’s where the firewall is; it’s a kernel module in each host). For that I need to get the host-id. For which I need to get the cluster-id. Sadly there’s no command to list all hosts (or at least I don’t know of any). 

So now I have my host-ids.

Let’s also take a look the my VMs (thankfully it’s a short list! I wonder how admins do this in real life):

We can see the filters applying to each VM.  To summarize:

And are these filters applying on the hosts themselves?

Hmm, that too looks fine. 

Next I picked up one of the rule sets and explored it further:

The Layer 3 & Layer 2 rules are in separate rule sets. I have marked the ones which I am interested in. One works, the other doesn’t. So I checked the address sets used by both:

Tada! And there we have the problem. The address set for the Layer 3 rule is empty. 

I checked this for the other rules too – same situation. I modified my Layer 3 rule to specifically target the subnets:

And the address set for that rule is not empty:

And because of this the firewall rules do work as expected. Hmm.

I modified this rule to be a group with my OpenBSD VMs from each network explicitly added to it (i.e. not dynamic membership in case that was causing an issue). But nope, same result – empty address set!

But the address set is now empty. :o)

So now I have an idea of the problem. I am not too surprised by this because I vaguely remember reading something about VMware Tools and IP detection inside a VM (i.e. NSX makes use of VMware Tools to know the IP address of a VM) and also because I am aware OpenBSD does not use the official VMware Tools package (it has its own and that only provides a subset of functions).

Googling a bit on this topic I came across the IP address Discovery section in the NSX Admin guide – prior to NSX 6.2 if VMware Tools wasn’t installed (or was stopped) NSX won’t be able to detect the IP address of the VM. Post NSX 6.2 it can do DHCP & ARP snooping to work around a missing/ stopped VMware Tools. We configure the latter in the host installation page:

I am going to go ahead and enable both on all my clusters. 

That helped. But it needs time. Initially the address set was empty. I started pings from one VM to another and the source VM IP was discovered and put in the address set; but since the destination VM wasn’t in the list traffic was still being allowed. I stopped pings, started pings, waited a while … tried again … and by then the second VM IP to was discovered and put in the address set – effectively blocking communication between them. 

Side by side I installed a Windows 8.1 VM with VMware Tools etc and tested to see if it was being automatically picked up (I did this before enabling the snooping above). It was. In fact its IPv6 address too was discovered via VMware Tools and added to the list:

Nice! Picked up something interesting today. 

Nested XenServer crashes when scrubbing memory

In case anyone else runs into this. I noticed that both XenServer 6.5 and 7.0 crash at the memory scrubbing stage during boot up when run as a VM within VMware Workstation (and possibly other virtualization products too – I didn’t try it with anything else). 

Am guessing the crash happens because the memory is not really available (this being a nested VM) and so the process crashes. Anyhoo, the workaround is to disable memory scrubbing. Check this blog post for instructions. 

In brief, the instructions are to add the option bootscrub=false to the boot options. This is via the file /boot/extlinux.conf in XenServer 6.5; or via /boot/grub/grub.cfg in XenServer 7.0.

Notes to self while installing NSX 6.3 (part 4)

Reading through the VMware NSX 6.3 Install Guide after having installed the DLR and ESG in my home lab. Continuing from the DLR section.

As I had mentioned earlier NSX provides routing via DLR or ESG.  

  • DLR == Distributed Logical Router.
  • ESG == Edge Services Gateway

DLR consists of an appliance that provides the control plane functionality. This appliance does not do any routing itself. The actual routing is done by the VIBs on the ESXi hosts. The appliance uses the NSX Controller to push out updates to the ESXi host. (Note: Only DLR. ESG does not depend on the Controller to push out route). Couple of points to keep in mind:

  • A DLR instance cannot connect to logical switches in different transport zones. 
  • A DLR cannot connect to a dvPortgroup with VLAN ID 0.
  • A DLR cannot connect to a dvPortgroup with VLAN ID if that DLR also connects to logical switches spanning more than one VDS. 
    • This confused me. Why would a logical switch span more than one VDS? I dunno. There are reasons probably, same way you could have multiple clusters in same data center having different VDSes instead of using the same one. 
  • If you have portgroups on different VDSes with the same VLAN ID, and these VDSes share some hosts, then DLR cannot connect these. 

I am not entirely clear with the above points. It’s more to enforce the transport zones and logical switches align correctly, but I haven’t entirely understood it so I am simply going to make note as above and move on …

In a DLR the firewall rules only apply to the uplink interface and are limited to traffic destined for the edge virtual appliance. In other words they don’t apply to traffic between the logical switches a DLR instance connects. (Note that this refers to the firwall settings found under the DLR section, not in the Firewall section of NSX). 

A DLR has many interfaces. The one exposed to VMs for routing is the Logical InterFace (LIF). Here’s a screenshot from the interfaces on my DLR. 

The ones of type ‘Internal’ are the LIFs. These are the interfaces that the DLR will route between. Each LIF connects to a separate network – in my case a logical switch each. The IP address assigned to this LIF will be the address you set as gateway for the devices in that network. So for example: one of the LIFs has an IP address 192.168.1.253 and connects to my 192.168.1.0/24 segment. All the VMs there will have 192.168.1.253 as their default gateway. Suppose we ignore the ‘Uplink’ interface for now (it’s optional, I created it for the external routing to work), and all our DLR had were the two ‘Internal’ LIFs, and VMs on each side had the respective IP address set as their default gateway, then our DLR will enable routing between these two networks. 

Unlike a physical router though, which exists outside the virtual network and which you can point to as “here’s my router”, there’s no such concept with DLRs. The DLR isn’t a VM which you can point to as your router. Nor is it a VM to which packets between these networks (logical switches) are sent to for routing. The DLR, as mentioned above, is simply your ESXi hosts. Each ESXi host that has logical switches which a DLR connects into has this LIF created in them with that LIF IP address assigned to it and a virtual MAC so VMs can send packets to it. The DLR is your ESXi host. (That is pretty cool, isn’t it! I shouldn’t be amazed because I had mentioned it earlier when reading about all this, but it is still cool to actually “see” it once I have implemented).

Above screenshot is from my two VMs on the same VXLAN but on different hosts. Note that the default gateway (192.168.1.253) MAC is the same for both. Each of their hosts will respond to this MAC entry. 

(Note to self: Need to explore the net-vdr command sometime. Came across it as I was Googling on how to find the MAC address table seen by the LIF on a host. Didn’t want to get side-tracked so didn’t explore too much. There’s something called a VDR (not encountered it yet in my readings).

  • net-vdr -I -l will list all the VDRs on a host.
  • net-vdr -L -l <vdrname> will list the LIFs.
  • net-vdr -N -l <vdrname> will list the MAC addresses (ARP info)

)

When creating a DLR it is possible to create it with or without the appliance. Remember that the appliance provides the control plane functionality. It is the appliance that learns of new routes etc and pushes to the DLR modules in the ESXi hosts. Without an appliance the DLR modules will do static routing (which might be more than enough, especially in a test environment like my nested lab for instance) so it is ok to skip it if your requirements are such. Adding an appliance means you get to (a) select if it is deployed in HA config (i.e. two appliance), (b) their locations etc, (c) IP address and such for the appliance, as well as enabling SSH. The appliance is connected to a different interface for HA and SSH – this is independent of the LIFs or Uplink interfaces. That interface isn’t used for any routing. 

Apart from the control plane, the appliance also controls the firewall on the DLR. If there’s no appliance you can’t make any firewall changes to the DLR – makes sense coz there’s nothing to change. You won’t be connecting to the DLR for SSH or anything coz you do that to the appliance on the HA interface. 

According to the docs you can’t add an appliance once a DLR instance is deployed. Not sure about that as I do see an option to deploy an appliance on my non-appliance DLR instance. Maybe it will fail when I actually try and create the appliance – I didn’t bother trying. 

Discovered this blog post while Googling for something. I’ve encountered & linked to his posts previously too. He has a lot of screenshots and step by step instructions. So worth a check out if you want to see some screenshots and much better explanation than me. :) Came across some commands from his blog which can be run on the NSX Controller to see the DLRs it is aware of and their interfaces. Pasting the output from my lab here for now, I will have to explore this later …

I have two DLRs. One has an appliance, other doesn’t. I made these two, and a bunch of logical switches to hook these to, to see if there’s any difference in functionality or options.

One thing I realized as part of this exercise is that a particular logical switch can only connect to one DLR. Initially I had one DLR which connected to 192.168.1.0/24 and 192.168.2.0/24. Its uplink was on logical switch 192.168.0.0/24 which is where the ESG too hooked into. Later when I made one more DLR with its own internal links and tried to connect its uplink to the 192.168.0.0/24 network used by the previous DLR, I saw that it didn’t even appear in the list of options. That’s when I realized its better to use a smaller range logical switch for the uplinks – like say a /30 network. This way each DLR instance connects to an ESG on its own /30 network logical switch (as in the output above). 

A DLR can have up to 8 uplink interfaces and 1000 internal interfaces.


Moving on to ESG. This is a virtual appliance. While a DLR provides East-West routing (i.e. within the virtual environment), an ESG provides North-South routing (i.e. out of the virtual environment). The ESG also provides services such as DHCP, NAT, VPN, and Load Balancing. (Note to self: DLR does not provide DHCP or Load Balancing as one might expect (at least I did! :p). DLR provides DHCP Relay though). 

The uplink of an ESG will be a VDS (Distributed Switch) as that’s what eventually connects an ESXi environment to the physical network. 

An ESG needs an appliance to be deployed. You can enable/ disable SSH into this appliance. If enabled you can SSH into the ESG appliance from the uplink address or from any of the internal link IP addresses. In contrast, you can only SSH into a DLR instance if it has an associated appliance. Even then, you cannot SSH into the appliance from the internal LIFs (coz these don’t really exist, remember … they are on each ESXi host). With a DLR we have to SSH into the interface used for HA (this can be used even if there’s only one appliance and hence no HA). 

When deploying an ESG appliance HA can be enabled. This deploys two appliances in an active/passive mode (and the two appliances will be on separate hosts). These two appliances will talk to each other to keep in sync via one of the internal interfaces (we can specify one, or NSX will just choose any). On this internal interface the appliances will have a link local IP address (a /30 subnet from 169.254.0.0/16) and communicate over that (doesn’t matter that there’s some other IP range actually used in that segment, as these are link local addresses and unlikely anyone’s going to actually use them). In contrast, if a DLR appliance is deployed with HA we need to specify a separate network from the networks that it be routing between. This can be a logical switch or a DVS, and as with ESG the two appliances will have link local IP addresses (a /30 subnet from 169.254.0.0/16) for communication. Optionally, we can specify an IP address in this network via which we can SSH into the DLR appliance (this IP address will not be used for HA, however).

After setting up all this, I also created two NAT rules just for kicks. 

And with that my basic setup of NSX is complete! (I skipped OSPF as I don’t think I will be using it any time soon in my immediate line of work; and if I ever need to I can come back to it later). Next I need to explore firewalls (micro-segmentation) and possibly load balancing etc … and generally fiddle around with this stuff. I’ve also got to start figuring out the troubleshooting and command-line stuff. But the base is done – I hope!

Yay! (VXLAN) contd. + Notes to self while installing NSX 6.3 (part 3)

Finally continuing with my NSX adventures … some two weeks have past since my last post. During this time I moved everything from VMware Workstation to ESXi. 

Initially I tried doing a lift and shift from Workstation to ESXi. Actually, initially I went with ESXi 6.5 and that kept crashing. Then I learnt it’s because I was using the HPE customized version of ESXi 6.5 and since the server model I was using isn’t supported by ESXi 6.5 it has a tendency to PSOD. But strangely the non-HPE customized version has no issues. But after trying the HPE version and failing a couple of times, I gave up and went to ESXi 5.5. Set it up, tried exporting from VMware Workstation to ESXi 5.5, and that failed as the VM hardware level on Workstation was newer than ESXi. 

Not an issue – I fired up VMware Converter and converted each VM from Workstation to ESXi. 

Then I thought hmm, maybe the MAC addresses will change and that will cause an issue, so I SSH’ed into the ESXi host and manually changed the MAC addresses of all my VMs to whatever it was in Workstation. Also changed the adapters to VMXNet3 wherever it wasn’t. Reloaded the VMs in ESXi, created all the networks (portgroups) etc, hooked up the VMs to these, and fired them up. That failed coz the MAC address ranges were of VMware Workstation and ESXi refuses to work with those! *grr* Not a problem – change the config files again to add a parameter asking ESXi to ignore this MAC address problem – and finally it all loaded. 

But all my Windows VMs had their adapters reset to a default state. Not sure why – maybe the drivers are different? I don’t know. I had to reconfigure all of them again. Then I turned to OpnSense – that too had reset all its network settings, so I had to configure those too – and finally to nested ESXi hosts. For whatever reason none of them were reachable; and worse, my vCenter VM was just a pain in the a$$. The web client kept throwing some errors and simply refused to open. 

That was the final straw. So in frustration I deleted it all and decided to give up.

But then …

I decided to start afresh. 

Installed ESXi 6.5 (the VMware version, non-HPE) on the host. Created a bunch of nested ESXi VMs in that from scratch. Added a Windows Server 2012R2 as the shared iSCSI storage and router. Created all the switches and port groups etc, hooked them up. Ran into some funny business with the Windows Firewall (I wanted to assign some interface as Private, others as Public, and enable firewall only only the Public ones – but after each reboot Windows kept resetting this). So I added OpnSense into the mix as my DMZ firewall.

So essentially you have my ESXi host -> which hooks into an internal vSwitch portgroup that has the OpnSense VM -> which hooks into another vSwitch portgroup where my Server 2012R2 is connected to, and that in turn connects to another vSwitch portgroup (a couple of them actually) where my ESXi hosts are connected to (need a couple of portgroup as my ESXi hosts have to be in separate L3 networks so I can actually see a benefit of VXLANs). OpnSense provides NAT and firewalling so none of my VMs are exposed from the outside network, yet they can connect to the outside network if needed. (I really love OpnSense by the way! An amazing product). 

Then I got to the task of setting these all up. Create the clusters, shared storage, DVS networks, install my OpenBSD VMs inside these nested EXSi hosts. Then install NSX Manager, deploy controllers, configure the ESXi hosts for NSX, setup VXLANs, segment IDs, transport zones, and finally create the Logical Switches! :) I was pissed off initially at having to do all this again, but on the whole it was good as I am now comfortable setting these up. Practice makes perfect, and doing this all again was like revision. Ran into problems at each step – small niggles, but it was frustrating. Along the way I found that my (virtual) network still does not seem to support large MTU sizes – but then I realized it’s coz my Server 2012R2 VM (which is the router) wasn’t setup with the large MTU size. Changed that, and that took care of the MTU issue. Now both Web UI and CLI tests for VXLAN succeed. Finally!

Third time lucky hopefully. Above are my two OpenBSD VMs on the same VXLAN, able to ping each other. They are actually on separate L3 ESXi hosts so without NSX they won’t be able to see each other. 

Not sure why there are duplicate packets being received. 

Next I went ahead and set up a DLR so there’s communicate between VXLANs. 

Yeah baby! :o)

Finally I spent some time setting up an ESG and got these OpenBSD VMs talking to my external network (and vice versa). 

The two command prompt windows are my Server 2012R2 on the LAN. It is able to ping the OpenBSD VMs and vice versa. This took a bit more time – not on the NSX side – as I forgot to add the routing info on the ESG for my two internal networks (192.168.1.0/24 and 192.168.2.0/24) as well on the Server 2012R2 (192.168.0.0/16). Once I did that routing worked as above. 

I am aware this is more of a screenshots plus talking post rather than any techie details, but I wanted to post this here as a record for myself. I finally got this working! Yay! Now to read the docs and see what I missed out and what I can customize. Time to break some stuff finally (intentionally). 

:o)

OU delegation not working (contd.) – finding protected groups

Turns out I was mistaken in my previous post. A few minutes after enabling inheritance, I noticed it was disabled again. So that means the groups must be protected by AD.

I knew of the AdminSDHolder object and how it provides a template set of permissions that are applied to protected accounts (i.e. members of groups that are protected). I also knew that there were some groups that are protected by default. What I didn’t know, however, what that the defaults can be changed. 

Initially I did a Compare-Object -ReferenceObject (Get-ADPrincipalGroupMembership User1) -DifferenceObject (Get-ADPrincipalGroupMembership User2) -IncludeEqual to compare the memberships of two random accounts that seemed to be protected. These were accounts with totally different roles & group memberships so the idea was to see if they had any common groups (none!) and failing that to see if the groups they were in had any common ancestors (none again!)

Then I Googled a bit :o) and came across a solution. 

Before moving on to that though, as a note to myself: 

  • The AdminSDHolder object is at CN=AdminSDHolder,CN=System,DC=domain,DC=com. Find that via ADSI Edit (replace the domain part accordingly). 
  • Right click the object and its Security tab lists the template permissions that will be applied to members of protected groups. You can make changes here. 
  • SDProp is a process that runs every 60 minutes on the DC holding the PDC Emulator role. The period can be changed via the registry key HKLM\SYSTEM\CurrentControlSet\Services\NTDS\Parameters\AdminSDProtectFrequency. (If it doesn’t exist, add it. DWORD). 
  • SDProp can be run manually if required. 

So back to my issue. Turns out if a group as its adminCount attribute set to 1 then it will be protected. So I ran the following against the OU containing my admin  account groups:

Bingo! Most of my admin groups were protected, so most admin accounts were protected. All I have to do now is either un-protect these groups (my preferred solution), or change the template to delegate permissions there. 

Update: Simply un-protecting a group does not un-protect all its members (this is by design). The member objects too have their adminCount attribute set to 1, so apart from fun-protecting the groups we must un-protect the members too. 

Update 2: Found this good post with lots more details. How to run the process manually, what are the default protected groups, etc. Read that post in conjunction with this one and you are set!

Update 3: You can unprotect the following default groups via dsHeuristics: 1) Account Operators, 2) Backup Operators, 3) Server Operators, 4) Print Operators. But that still leaves groups such as Administrators (built-in), Domain Admins, Enterprise Admins, Domain Controllers, Schema Admins, Read-Only Domain Controllers, and the user Administrator (built-in). There’s no way to un-protect members of these.

Something I hadn’t realized about adminCount. This attribute does not mean a group/ user will be protected. Instead, what it means is that if a group/ user is protected, and its ACLs have changed and are now reset to default, then the adminCount attribute will be set. So yes, adminCount will let you find groups/ users that are protected; but merely setting adminCount on a group/ user does not protect it. I learnt this the hard way while I was testing my changes. Set adminCount to 1 for a group and saw that nothing was happening.

Also, it is possible that a protected user/ group does not have adminCount set. This is because adminCount is only set if there is a difference in the ACLs between the user/ group and the AdminSDHolder object. If there’s no difference, a protected object will not have the adminCount attribute set. :)

OU delegation not working

Today I cracked a problem which had troubled us for a while but which I never really sat down and actually tried to troubleshoot. We had an OU with 3rd level admin accounts that no one else had rights to but wanted to delegate certain password related tasks to our Service Desk admins. Basically let them reset password, unlock the account, and enable/ disable. 

Here’s some screenshots for the delegation wizard. Password reset is a common task and can be seen in the screenshot itself. Enable/ Disable can be delegated by giving rights to the userAccountControl attribute. Only force password change rights (i.e. no reset password) can be given via the pwdLastSet attribute. And unlock can be given via the lockoutTime attribute

Problem was that in my case in spite of doing all this the delegated accounts had no rights!

Snooping around a bit I realized that all the admin accounts within the OU had inheritance disabled and so weren’t getting the delegated permissions from the OU (not sure why; and no these weren’t protected group members). 

Of course, enabling is easy. But I wanted to see if I could get a list of all the accounts in there with their inheritance status. Time for PowerShell. :)

The Get-ACL cmdlet can list access control lists. It can work with AD objects via the AD: drive. Needs a distinguished name, that’s all. So all you have to do is (Get-ADUser <accountname>).DistinguishedName) – prefix an AD: to this, and pass it to Get-ACL. Something like this:

The default result is useless. If you pipe and expand the Access property you will get a list of ACLs. 

The result is a series of entries like these:

The attribute names referred to by the GUIDs can be found in the AD Technical Specs

Of interest to us is the AreAccessRulesProtected property. If this is True then inheritance is disabled; if False inheritance is enabled. So it’s straight forward to make a list of accounts and their inheritance status:

So that’s it. Next step would be to enable inheritance on the accounts. I won’t be doing this now (as it’s bed time!) but one can do it manually or script it via the SetAccessRuleProtection method. This method takes two parameters (enable/ disable inheritance; and if disable then should we add/ remove existing ACEs). Only the first parameter is of significance in my case, but I have to pass the second parameter too anyways – SetAccessRuleProtection($False,$True).

Update: Here’s what I rolled out at work today to make the change.

Update 2: Didn’t realize I had many users in the built-in protected groups (these are protected even though their adminCount is 0 – I hadn’t realized that). To unprotect these one must set the dsHeuristics flag. The built-in protected groups are 1) Account Operators, 2) Server Operators, 3) Print Operators, and 4) Backup Operators. See this post on instructions (actually, see the post below for even better instructions).

Update 3: Found this amazing page that goes into a hell of details on this topic. Be sure to read this before modifying dsHeuristics.

Anthromorphizing

So. Previously I had my OnePlus 3T and iPhone 6 paired with the Sennheiser PXC 550. Whenever I’d connect the headphones would announce the OnePlus 3T as “phone 1” and iPhone as “phone 2” as that’s the order I had initially paired them in. 

Ever since I paired the iPhone 7 Plus though the headphones announce both phones as “phone 1”. I find that funny coz I imagine it must be confusing to the headphones to have two phones that are both “phone 1” and in my mind it’s as though the iPhone 7 Plus is trying to be a dominant partner and say “no, I too must be phone 1! period!” :)

Just an example of how we try and assign human attributes to gadgets and other things. I find it funny that I am attributing some “nature” to the phone. This is not the only one though. I find that the iPhone 7 Plus gets along better with the OnePlus 3T and Sennheisers. If I have music playing from the OnePlus 3T and I turn on the iPhone 6 it would “claim” the channel so to say by blocking out the OnePlus 3T. The latter would continue playing but nothing comes out of the Sennheisers any more. The iPhone 7 Plus on the other hand is better. It too takes over but 1) pauses the OnePlus 3T and 2) if I am not playing any audio it gives control back to OnePlus 3T and resumes playing music. Again there’s some techie reason for this I am sure, but in my mind I attribute qualities like the iPhone 7 Plus gets along better or whatever. 

Anyhoo. That’s all! :)