[Aside] Citrix VDI Best Practices for XenApp and XenDesktop 7.6 LTSR

This is an amazing document! Skimming through the PDF version and I am blown away. Some day when I have to make Citrix related decisions, this is the document I will be turning to. (Came across it via the Citrix blog, so thank you!)

There’s also a XenDesktop handbook but I haven’t read it yet. 

[Aside] PVS Caching

Was reading this blog post (PVS Cache in RAM with Disk Overflow) when I came across a Citrix KB article that mentioned this feature was introduced because of the ASLR feature introduced in Windows Vista. Apparently when you set the PVS Cache to be the target device hard disk, it causes issues with ASLR. Not sure how ASLR (which is a memory thing) should be affected by disk write cache choices, but there you go. It’s something to do with PVS modifying the Memory Descriptor List (MDL) before writing it to the disk cache, and then when Windows reads it back and finds the MDL has changed from what it expected it to be, it crashes due to ASLR protection. 

Any how, while Googling on that I came across this nice Citrix article on the various types of PVS caching it offers:

  • Cache on the PVS Server (not recommended in production due to poor performance)
  • Cache on device RAM
    • A portion of the device’s RAM is reserved as cache and not usable by the OS. 
  • Cache on device Disk
    • It’s also possible to use the device Disk buffers (i.e. the disk cache). By default it’s disabled, but can be enabled.
    • This is actually implemented via a file on the device Disk (called .vdiskcache).
    • Note: the device Disk could be the disks local to the hypervisor or could even be shared storage to the hypervisors – depends on where the device (VM) disks are placed. Better performance with the former of course. 
  • Cache on device RAM with overflow to device Disk
    • This is a new feature since PVS 7.1. 
    • Rather than use a portion of the device RAM that is not usable by the OS, the RAM cache portion is mapped to the non-paged RAM and used as needed. Thus the OS can use RAM from this pool. Also, the OS gets priority over PVS RAM cache to this non-paged RAM pool.
    • Rather than use a file for the device Disk cache, a new VHDX file is used. It is not possible to use the device Disk buffers though. 

The blog post I linked to also goes into detail on the above. Part 2 of that blog post is amazing for the results it shows and is a must read for these and the general info it provides (e.g. IOPS, how to measure them, etc). Just to summarize though: if we use cache on device RAM with overflow to device Disk, you get tremendous performance benefits. Even just 256 MB device RAM cache is enough to make a difference.

… the new PVS RAM Cache with Hard Disk Overflow feature is a major game changer when it comes to delivering extreme performance while eliminating the need to buy expensive SAN I/O for both XenApp and Pooled VDI Desktops delivered with XenDesktop. One of the reasons this feature gives such a performance boost even with modest amounts of RAM is due to how it changes the profile for how I/O is written to disk. A XenApp or VDI workload traditionally sends mostly 4K Random write I/O to the disk. This is the hardest I/O for a disk to service and is why VDI has been such a burden on the SAN. With this new cache feature, all I/O is first written to memory which is a major performance boost. When the cache memory is full and overflows to disk, it will flush to a VHDX file on the disk. We flush the data using 2MB page sizes. VHDX with 2MB page sizes give us a huge I/O benefit because instead of 4K random writes, we are now asking the disk to do 2MB sequential writes. This is significantly more efficient and will allow data to be flushed to disk with fewer IOPS.

You no longer need to purchase or even consider purchasing expense flash or SSD storage for VDI anymore. <snip> VDI can now safely run on cheap tier 3 SATA storage!


A follow-up post from someone else at Citrix to the two part blog posts above (1 & 2): PVS RAM Cache overflow sizing. An interesting takeaway: it’s good to defragment the vDisk as that gives up to 30% write cache savings (an additional 15% if the defrag is done while the OS is not loaded). Read the blog post for an explanation of why. Don’t do this with versioned vDisks though. Also, cache on device RAM with overflow to device Disk reserves 2 MB blocks on the cache and writes in 4 KB clusters whereas cache on device Disk used to write in 4 KB clusters without reserving any blocks beforehand. So it might seem like cache on device RAM with overflow to device Disk uses more space, but that’s not really the case …

As a reference to myself for later: LoginVSI seems to be the tool for measuring VDI IOPS. Also, yet to read these but two links on IOPS and VDI (came across these from some blog posts):

[Aside] PVS vs MCS

Haven’t read most of these. Just putting them here for when I need ’em later.

[Aside] NetScaler – CLI Networking

Just putting these two here as a reference to myself (no idea why coz I am sure I’ll just Google and find them later when I need to :p)

As an aside (to this aside):

  • The NetScaler config is stored as ns.conf at /nsconfig
  • Older versions have a .0, .1, .2, etc suffixed to the filename. 
  • Backups are stored in /var/ns_sys_backup.
  • More info on backups etc

[Aside] Useful CA/ Certificates info

[Aside] NetScaler VPX Express limitations etc.

Reading about NetScaler VPX as we are looking at implementing VPX Express in our site as part of a POC. 

  1. VPX Express is limited to 5  Mbps (as opposed to say 1 Gbps for VPX-1000). 
  2. The license is free but you have to keep renewing annually.
  3. The Edition is NetScaler Standard. This and this are two links I found that explain the difference between various editions. 
    1. tl; dr version: Standard is fine for most uses. 
  4. The Gateway part supports 5 concurrent user connections. 
  5. You cannot vMotion or XenMotion VPX. 
  6. This is an excellent blog post on VPX, MPX, and others. Worth a read. 

[Aside] How to Secure an ARM-based Windows Virtual Machine RDP access in Azure

Just putting this here as a bookmark to myself for later. A good post. 

[Aside] NetScaler newnslog files

Some links to myself on the newnslog files (these are binary log files; high precision; need a tool called nsconmsg to view them). 

A typical format of the command is like this:

The <operation> can be one of these (this is just a copy-paste from nsconmsg -?):

The newnslog files are rotated every 2 days (or a certain number of events if I remember correctly). The older ones can be accessed by putting a path to that file (e.g. /var/nslog/newnslog.28.tar.gz in the command above). This will extract the file and show the logs. The Citrix page says we have to extract the logs first, but am guessing that’s old info. 

That’s all for now. Will add more to this post later …

[Aside] NetScaler SSL

Just putting in these links as bookmarks to myself for future. I kinda followed them while I was trying to change my NetScaler certs (kinda followed, coz I didn’t find these links when I Googled initially, so I just went ahead and figured it out by trying; but later I came across these and thought it would be a good idea to link them here). 

[Aside] How to quickly get ESXi logs from a web browser (without SSH, vSphere client, etc)

This post made my work easy yesterday –

tl;dr version:  go to https://IP_of_Your_ESXi/host

[Aside] The Ultimate Guide To Being An Introvert – Altucher Confidential

I tweeted this link but then thought I should put it on my blog too mainly as a reference to myself. Sometimes I wander through my blog looking for wisdom and I hope to find this post then. A great read, especially if you are an introvert and view that/ have been told that it’s a bad thing.

Read the full article (it is long); here’s an excerpt I liked. 

Being an introvert has nothing to do with being shy. Or being outgoing or not outgoing. Or being socially awkward.

All it means is that some people recharge when they are by themselves (introverts).

Other people recharge when they are interacting with many other people (extraverts) and most people are in the middle.

I lose energy very quickly when in a group of people. Getting invited to a party is horrible for me.

I say “no” to almost every social situation. Because I know they will take energy away from me doing the things I love.
If I’m giving a talk it’s no problem. Because I’m by myself on the stage. It’s one to many instead of me just one in a mess of people. I recharge on the stage.

[Aside] Windows Update tools

Wanted to link to these as I came across them while searching for something Windows Updates related.

  • ABC-Update – didn’t try it out but looks useful from a client side point of view. Free.
  • WuInstall – seems to be a client and server thing. Putting it here so I find it if ever needed in future. Paid.
  • Windows Update PowerShell Module – you had me at PowerShell! :0)
    • A blog post explaining this module. Just in case.

[Aside] Memory Resource Management in ESXi

Came across this PDF from VMware while reading on memory management. It’s dated, but a good read. Below are some notes I took while reading it. Wanted to link to the PDF and also put these somewhere; hence this post.

Some terminology:

  • Host physical memory <–[mapped to]– Guest physical memory (continuous virtual address space presented by Hypervisor to Guest OS) <–[mapped to]– Guest virtual memory (continuous virtual address space presented by Guest OS to its applications).
    • Guest virtual -> Guest physical mapping is in Guest OS page tables
    • Guest physical -> Host physical mapping is in pmap data structure
      • There’s also a shadow page table that the Hypervisor maintains for Guest virtual -> Guest physical
      • A VM does Guest virtual -> Guest physical mapping via hardware Translation Lookup Buffers (TLBs). The hypervisor intercepts calls to these; and uses these to keep its shadow page tables up to date.
  • Guest physical memory -> Guest swap device (disk) == Guest level paging.
  • Guest physical memory -> Host swap device (disk) == Hypervisor swapping.

Some interesting bits on the process:

  • Applications use OS provided interfaces to allocate & de-allocate memory.
  • OSes have different implementations on how memory is classified as free or allocated. For example: two lists.
  • A VM has no pre-allocated physical memory.
  • Hypervisor maintains its own data structures for free and allocated memory for a VM.
  • Allocating memory for a VM is easy. When the VM Guest OS makes a request to a certain location, it will generate a page fault. The hypervisor can capture that and allocate memory.
  • De-allocation is tricky because there’s no way for the hypervisor to know the memory is not in use. These lists are internal to the OS. So there’s no straight-forward way to take back memory from a VM.
  • The host physical memory assigned to a VM doesn’t keep growing indefinitely though as the guest OS will free and allocate within the range assigned to it, so it will stick within what it has. And side by side the hypervisor tries to take back memory anyways.
    • Only when the VM tries to access memory that is not actually mapped to host physical memory does a page fault happen. The hypervisor will intercept that and allocate memory.
  • For de-allocation, the hypervisor adds the VM assigned memory to a free list. Actual data in the physical memory may not be modified. Only when that physical memory is subsequently allocated to some other VM does it get zeroed out.
  • Ballooning is one way of reclaiming memory from the VM. This is a driver loaded in the Guest OS.
    • Hypervisor tells ballooning driver how much memory it needs back.
    • Driver will pin those memory pages using Guest OS APIs (so the Guest OS thinks those pages are in use and should not assign to anyone else).
    • Driver will inform Hypervisor it has done this. And Hypervisor will remove the physical backing of those pages from physical memory and assign it to other VMs.
    • Basically the balloon driver inflates the VM’s memory usage, giving it the impression a lot of memory is in use. Hence the term “balloon”.
  • Another way is Hypervisor swapping. In this the Hypervisor swaps to physical disk some of the physical memory it has assigned to the VM. So what the VM thinks is physical memory is actually on disk. This is basically swapping – just that it’s done by Hypervisor, instead of Guest OS.
    • This is not at all preferred coz it’s obviously going to affect VM performance.
    • Moreover, the Guest OS too could swap the same memory pages to its disk if it is under memory pressure. Hence double paging.
  • Ballooning is slow. Hypervisor swapping is fast. Ballooning is preferred though; Hypervisor swapping is only used when under lots of pressure.
  • Host (Hypervisor) has 4 memory states (view this via esxtop, press m).
    • High == All Good
    • Soft == Start ballooning. (Starts before the soft state is actually reached).
    • Hard == Hypervisor swapping too.
    • Low == Hypervisor swapping + block VMs that use more memory than their target allocations.


[Aside] Bridgehead Server Selection improvements in Server 2008 R2

Came across this blog post when researching for something. Long time since I read anything AD related (since I am more focused on VMware and HP servers) at work nowadays. Was a good read.

Summary of the post:

  • When you have a domain spread over multiple sites, there is a designated Bridgehead server in each site that replicates changes with/ to the Bridgehead server in other sites.
    • Bridgehead servers talk to each other via IP or SMTP.
    • Bridgehead servers are per partition of the domain. A single Bridgehead server can replicate for multiple partitions and transports.
    • Since Server 2003 there can be multiple Bridgehead servers per partition in the domain and connections can be load-balanced amongst these. The connections will be load-balanced to Server 2000 DCs as well.
  • Bridgehead servers are automatically selected (by default). The selection is made by a DC that holds the Inter-Site Topology Generator (ISTG) role.
    • The DC holding the ISTG role is usually the first DC in the site (the role will failover to another DC if this one fails; also, the role can be manually moved to another DC).
    • It is possible designate certain DCs are preferred Bridgehead servers. In this case the ISTG will choose a Bridgehead server from this list.
  • It is also possible to manually create connections from one DC to another for each site and partition, avoiding ISTG altogether.
  • On each DC there is a process called the Knowledge Consistency Checker (KCC). This process is what actually creates the replication topology for the domain.
  • The KCC process running on the DC holding the ISTG role is what selects the Bridgehead servers.

The above was just background. Now on to the improvements in Server 2008 R2:

  • As mentioned above, in Server 2000 you had one Bridgehead server per partition per site.
  • In Server 2003 you could have multiple Bridgehead servers per partition per site. There was no automatic load-balancing though – you had to use a tool such as Adlb.exe to manually load-balance among the multiple Bridgehead servers.
  • In Server 2008 you had automatic load-balancing. But only for Read-Only Domain Controllers (RODCs).
    • So if Site A had 5 DCs, the RODCs in other sites would load-balance their incoming connections (remember RODCs only have incoming connections) across these 5 DCs. If a 6th DC was added to Site A, the RODCs would automatically load-balance with that new DC.
    • Regular DCs (Read-Write DCs) too would load-balance their incoming connections across these 5 DCs. But if a 6th DC was added they wouldn’t automatically load-balance with that new DC. You would still need to run a tool like Aldb.exe to load-balance (or delete the inbound connection objects on these regular DCs and run KCC again?).
    • Regular DCs would sort of load-balance their outbound connections to Site A. The majority of incoming connections to Site A would still hit a single DC.
  • In Server 2008 R2 you have complete automatic load-balancing. Even for regular DCs.
    • In the above example: not only would the regular DCs automatically load-balance their incoming connections with the new 6th DC, but they would also load-balance their outbound connections with the DCs in Site A (and when the new DC is added automatically load-balance with that too). 

To view Bridgeheads connected to a DC run the following command:

The KCC runs every 15 minutes (can be changed via registry). The following command runs it manually:

Also, the KCC prefers DCs that are more stable / readily available than DCs that are intermittently available. Thus DCs that are offline for an extended period do not get rebalanced automatically when they become online (at least not immediately.

[Aside] Interesting stuff to read/ listen/ watch

  • How GitHub Conquered Google, Microsoft, and Everyone Else | WIRED
    • How GitHub has taken over as the go to code repository for everyone, even Google, Microsoft, etc. So much so that Google shut down Google Code, and while Microsoft still has their Codeplex up and running as an alternative, they too post to GitHub as that’s where all the developers are.
    • The article is worth a read for how Git makes this possible. In the past, with Centralized Version Control Systems (CVCS) such as Subversion, the master copy of your code was with this central repository and so there was a fear of what would happen if that central repository went down. But with Distributed Version Control Systems (DVCS) there’s no fear of such a thing happening because your code lives locally on your machine too.
  • Channel 9 talks on DSC. A few weeks ago I had tried attending this Jeffrey Snover talk on PowerShell Desired State Configuration (DSC) but I couldn’t because of bandwidth issues. Those talks are now online (been 5 days I think), here’s links to anyone interested:
  • Solve for X | NPR TED Radio Hour
  • Becoming Steve Jobs
    • A new book on Steve Jobs. Based on extensive interviews of people at Apple. Seems to offer a more “truthful” picture of Steve Jobs than that other book.
    • Discovered via Prismatic (I don’t have the original link, sorry).
    • Apparently Tim Cook even offered Steve Jobs his liver to help with his health. Nice!
  • Why you shouldn’t buy a NAS like Synology, Drobo, etc.
    • Came across this via Prismatic. Putting it here because this is something I was thinking of writing a blog post about myself.
    • Once upon a time I used to have Linux servers running Samba. Later I tried FreeBSD+ZFS and Samba. Lately I have been thinking of using FreeNAS. But each time I scrap all those attempts/ ideas and stick with running all my file shares over my Windows 8.1 desktop. Simply because they offer native NTFS support and that works best in my situation as all my clients are Windows and I have things set down the way I want with NTFS permissions etc.
    • Samba is great but if your clients are primarily Windows then it’s a hassle, I think. Better to stick with Windows on the server end too.
    • Another reason I wouldn’t go with a NAS solution is because I am then dependent on the NAS box. Sure it’s convenient and all, but if that box fails then I have to get a similar box just to read my data off the disks (assuming I can take out disks from one box and put into another). But with my current setup I have no such issues. I have a bunch of internal and external disks attached to my desktop PC; if that PC were to ever fail, I can easily plug these into any space PC/ laptop and everything’s accessible as before.
    • I don’t do anything fancy in terms of mirroring for redundancy either! I have a batch file that does a robocopy between the primary disk and its backup disk every night. This way if a disk fails I only lose the last 24 hours of data at most. And if I know I have added lots of data recently, I run the batch file manually just in case.
      • It’s good to keep an offsite backup too. For this reason I use CrashPlan to backup data offsite. That’s the backup to my backup. Just in case …
    • If I get a chance I want to convert some of my external disks to internal and/ or USB 3.0. That’s the only plan I currently have in mind for these.
  • EMET 5.2 is now available! (via)
    • I’ve been an EMET user for over a year now. Came across it via the Security Now! podcast.