[Aside] PVS Caching

Was reading this blog post (PVS Cache in RAM with Disk Overflow) when I came across a Citrix KB article that mentioned this feature was introduced because of the ASLR feature introduced in Windows Vista. Apparently when you set the PVS Cache to be the target device hard disk, it causes issues with ASLR. Not sure how ASLR (which is a memory thing) should be affected by disk write cache choices, but there you go. It’s something to do with PVS modifying the Memory Descriptor List (MDL) before writing it to the disk cache, and then when Windows reads it back and finds the MDL has changed from what it expected it to be, it crashes due to ASLR protection. 

Any how, while Googling on that I came across this nice Citrix article on the various types of PVS caching it offers:

  • Cache on the PVS Server (not recommended in production due to poor performance)
  • Cache on device RAM
    • A portion of the device’s RAM is reserved as cache and not usable by the OS. 
  • Cache on device Disk
    • It’s also possible to use the device Disk buffers (i.e. the disk cache). By default it’s disabled, but can be enabled.
    • This is actually implemented via a file on the device Disk (called .vdiskcache).
    • Note: the device Disk could be the disks local to the hypervisor or could even be shared storage to the hypervisors – depends on where the device (VM) disks are placed. Better performance with the former of course. 
  • Cache on device RAM with overflow to device Disk
    • This is a new feature since PVS 7.1. 
    • Rather than use a portion of the device RAM that is not usable by the OS, the RAM cache portion is mapped to the non-paged RAM and used as needed. Thus the OS can use RAM from this pool. Also, the OS gets priority over PVS RAM cache to this non-paged RAM pool.
    • Rather than use a file for the device Disk cache, a new VHDX file is used. It is not possible to use the device Disk buffers though. 

The blog post I linked to also goes into detail on the above. Part 2 of that blog post is amazing for the results it shows and is a must read for these and the general info it provides (e.g. IOPS, how to measure them, etc). Just to summarize though: if we use cache on device RAM with overflow to device Disk, you get tremendous performance benefits. Even just 256 MB device RAM cache is enough to make a difference.

… the new PVS RAM Cache with Hard Disk Overflow feature is a major game changer when it comes to delivering extreme performance while eliminating the need to buy expensive SAN I/O for both XenApp and Pooled VDI Desktops delivered with XenDesktop. One of the reasons this feature gives such a performance boost even with modest amounts of RAM is due to how it changes the profile for how I/O is written to disk. A XenApp or VDI workload traditionally sends mostly 4K Random write I/O to the disk. This is the hardest I/O for a disk to service and is why VDI has been such a burden on the SAN. With this new cache feature, all I/O is first written to memory which is a major performance boost. When the cache memory is full and overflows to disk, it will flush to a VHDX file on the disk. We flush the data using 2MB page sizes. VHDX with 2MB page sizes give us a huge I/O benefit because instead of 4K random writes, we are now asking the disk to do 2MB sequential writes. This is significantly more efficient and will allow data to be flushed to disk with fewer IOPS.

You no longer need to purchase or even consider purchasing expense flash or SSD storage for VDI anymore. <snip> VDI can now safely run on cheap tier 3 SATA storage!

Nice!

A follow-up post from someone else at Citrix to the two part blog posts above (1 & 2): PVS RAM Cache overflow sizing. An interesting takeaway: it’s good to defragment the vDisk as that gives up to 30% write cache savings (an additional 15% if the defrag is done while the OS is not loaded). Read the blog post for an explanation of why. Don’t do this with versioned vDisks though. Also, cache on device RAM with overflow to device Disk reserves 2 MB blocks on the cache and writes in 4 KB clusters whereas cache on device Disk used to write in 4 KB clusters without reserving any blocks beforehand. So it might seem like cache on device RAM with overflow to device Disk uses more space, but that’s not really the case …

As a reference to myself for later: LoginVSI seems to be the tool for measuring VDI IOPS. Also, yet to read these but two links on IOPS and VDI (came across these from some blog posts):