Directory Junctions and Volume Shadow Snapshots

Stumbled upon something by accident yesterday. Figured it would be a good idea to post about it in case it helps someone else. Before going into that though it’s worth briefly mentioning the various types of links you can have on NTFS.

Same data, but multiple links to it. It only applies to files (because files are what contain data). Think of it like Unix hard links. Hard links can only be made to data in the same volume. That is, you can have hard links from c:\abc.txt to c:\def.txt but you cannot have a hard link to d:\def.txt (because it’s on a different volume).

An interesting side effect of hard links is that because all the links point to the data, deleting any of those links still leaves the data accessible. The original file which was created to create the data is no longer its sole owner. The data is what matters, not the file associated with it (as you can have multiple files – the hard links – associated with the same data). The only way to actually delete the data is when all links pointing to it are removed. (Each time a link is created to the data Windows increments a counter in the file table for that data; when a link is removed the counter is decremented). NTFS has a limit of 1023 links to the same data.

I use hard links at home while taking backups. I use robocopy to backup my folders, and since I am copying the folder when making a backup I end up having multiple copies of the same file – all taking up unnecessary space. To avoid this it’s possible to be smart and only create hard links (instead of copying) to files that are unchanged. This way all your backups point to the same data and even if the original file is deleted the data still remains as long as the links point to it. I came across this idea from my Linux days; on Windows I use the excellent DeLorean Copy for this (to be correct, I use the command-line version).

Worth repeating – hard links are only for files (because files are what contain data).

The data has a file associated with it (as usual). You create soft links to the file. Unlike a hard link though where everything points to the data, here everything points to the original file containing the data. That original file is still important and if it is deleted all the links become invalid.

Soft links can be for files and folders. The target they point to can be on different volumes or even network shares. Moreover, the path to the target can be absolute or relative (if it’s relative then obviously it can only be to a target on the same volume). Also, the target needn’t even exist when creating the soft link! Only when the soft link is actually accessed must the target exist. (Makes sense when you think of how the target can be a network share that may not be accessible always). This way if a target pointed to by a soft link is deleted, the soft link still exists – just that it won’t work. Recreate the deleted target and the soft link will continue as usual.

Although soft links point to a target and not the actual data, that is transparent to end-users. For an end-user the link appears just like a regular file or folder. Some people move folders from their C: drive to other locations via soft links. Don’t do this for all folders though (for example: ProgramData). Also, apparently symbolic links do not work at boot so the Windows folder can’t be redirected. I have used symbolic links to sync folders outside my Dropbox folder with Dropbox.

Not everyone can create soft links. The SeCreateSymbolicLinkPrivilege privilege must be present to create soft links. By default administrators have this privilege; for non-administrators this privilege can be granted via Security Policies

Directory Junction

A special type of folder soft link is a Directory Junction. A directory junction is like a folder soft link except that it cannot point to network shares. Directory Junctions were introduced in Windows 2000 and make use of something called “reparse points” (which is an NTFS feature) (as an aside: reparse points are what OneDrive/ SkyDrive too use for selective sync). There are two types of “junctions” possible – directory junctions, which redirect one directory to another; and volume junctions, which redirect a directory to a volume. In both types of junctions the target is an absolute path – not relative, and not a network share.

Soft links/ Symbolic links are an evolution of Directory Junctions (though I present the latter as a special case of the former). Soft links/ Symbolic links were introduced in Vista and are baked into the kernel. They behave like *nix symbolic links. Oddly, however, even though directory junctions were present from Windows 2000 they were never widely used, but once symbolic links were introduced directory junctions became more widely used (in fact Vista and upwards use directory junctions to redirect folders such as “Users” to “Documents and Settings”). Vista also introduced tools such as mklink to create directory junctions, soft links, and hard links.

Moving on to what I stumbled upon yesterday.

Consider the following:

I have a folder and a symbolic link and a directory junction pointing to that folder.

The folder has a file a.txt in it. Thus both the symbolic link and directory junction too will show this file.

Now say I make a shadow copy of this drive (the C: drive in my case) and mount it someplace (say C:\Shadow).  This location is a shadow copy of the C: drive and as such it too will contain the folder and two links I created above.

Here’s the catch though …

Say I add a new file b.txt to the folder on C: drive (C:\TEST\Folder). One would expect that file to be present in the two links of the C: drive – obviously – but not in the two links of the shadow C: drive. But that’s just not what happens. The file is present in the two links on the C: drive and also the directory junction of the shadow volume!

This bit me yesterday because for various reasons I had some directory junctions in my C: drive and I was trying to back them up via shadow copies. But the backups kept failing as the files in the shadow copy were in use (because they were pointing to the live volume by virtue of being directory junctions!) and that was an unexpected behavior. After some troubleshooting I realized directory junctions were the culprit.

Later, I read up on reparse points and VSS and realized that while the reparse point will be backed up via VSS, the backed up location cannot be traversed in the shadow copy. Apparently, if the reparse point target is a separate volume (i.e. a volume junction), that volume will be shadow copied and must be accessed as an independent shadow copy. But if the reparse point is a directory (i.e. a directory junction like in my case) I am not sure what happens. It looks like it isn’t shadow copied, and the directory junction is repointed to the target on the main volume.

Moral of the story: stick to symbolic links rather than directory junctions? (Update this part once I have a better answer …)

Update: I changed the two links above to point to a folder on another drive (D:). Then I took a snapshot of the C: drive. When I accessed the links in the snapshot, they were still referring to the live D: drive and not a shadow version of it. In fact, no shadow of D: drive was automatically created when I took a shadow of C: drive. I tried to create a manual snapshot of D: drive but that failed because I use TrueCrypt and apparently that doesn’t support snapshots (the error I got was VSS_E_UNEXPECTED_PROVIDER_ERROR; see this and this (broken link)).

Next I created two VHDs to simulate two drives.

Created links from folders in G: drive to a folder in H: drive. Took snapshots of both drives. Mounted them as c:\ShadowG and c:\ShadowH. Yet when I go to the links in c:\ShadowG they still point to the live volume H: and not the snapshot H:

Odd! Even more odd is the fact that now both directory junction and symbolic link point to the live volume. So I guess the revised moral of the story is that when using directory junctions or symbolic links, try and refer to the actual target rather than the link/ junction itself. The link/ junction point to the live file system, not the shadow copy.

This also means if you were to mount an older shadow copy to try and retrieve some file you deleted in your Desktop for instance, go and check under \path\to\shadow\Users\... rather than \path\to\shadow\Documents and Settings\... – the latter is a junction and will always point to the live system, not the shadow copy as you’d expect.

Notes on Volume Shadow Copy in Windows (or How to backup open PST files via Robocopy)

Since XP and Server 2003 Windows has had this cool feature called Volume Shadow Copy (also known as Shadow Copy or Volume Snapshot Service (VSS)). It’s a cool feature in that it lets you take read-only snapshots of your file-system so you can then trawl through it or take backups and such. When Windows create system restore points or does backups this is what it uses. Without VSS Windows wouldn’t be able to backup files that are in use by the system/ applications as these files are locked; but with VSS it can take a “snapshot” of everything as they are at that point in time and the backup program or system restore can access the files in this snapshot.

A good overview of the Volume Shadow Copy service can be found in these two (1 & 2) TechNet articles. What follows is a tl;dr version but I suggest you read the original articles.

• Volume Shadow Copy consists of 4 components:
1. the VSS requester, which is the software that actually requests for a shadow copy to be made – for e.g. Windows Backup;
2. the VSS provider, which is a component that actually creates and maintains shadow copies – for e.g. Windows includes a VSS provider that allows for Copy-on-Write (COW) snapshots, SAN solutions usually have their own VSS providers;
1. there are three types of providers – hardware providers, software providers, and the system provider which is a part of Windows (see the section on Shadow Copy Providers in this link)
3. the VSS writer, which is a component of Windows or installed software whose role is to ensure that any data belonging to Windows (e.g. Registry) or such software (e.g. Active Directory, SQL or Exchange databases) is in a consistent state for the shadow copy to be made;
1. Windows includes many writers (see the section on In-Box VSS writers in this link)
2. Databases such as Active Directory and Exchange use transaction logs. Which is to say changes are not written to the database directly, rather they are written to memory first and a note made in a special file (the “transaction log”). During periods of non-peak activity changes in the transaction log are committed to the actual database. This way even if the database were to crash during a transaction, when it comes back up the transaction log can be “replayed” and any uncommitted transactions can be added to the database. Here are some links which explain this concept for Active Directory (1 & 2) and Exchange (1).
3. What the VSS writer component of Active Directory or Exchange does is that when a snapshot is taken of the database it will be in a consistent state wherein any pending transactions are written to it or are waiting to be written, never in a state where the transactions are being committed to the database.
4. the VSS service, which is a coordinator for all the components above.
• Here’s how it all falls into place:
1. The requester tells the service that it wants a shadow copy and so it needs a list of the writers on the system with their metadata.
2. The service asks all the writers to get back with the information. Each writer creates an XML description of its components and data stores. Each writer also defines a restore method.
3. The service passes the above information to the requester. The requester selects the components it wants to shadow copy.
4. The service now notifies the writers of the selected components to prepare their data (complete any open transactions, flush caches, roll transaction logs, etc).
5. Each writer does so and notifies the service.
6. The service tells each writer to temporarily freeze all application I/O write requests (read I/O requests are still allowed).
1. This freeze is not allowed to take longer than 60 seconds.
2. The service also flushes the file system cache.
7. The service then notifies the provider. The provider makes a shadow copy – this is not allowed to last for longer than 10 seconds, during which all I/O requests to that file system are frozen.
8. Once copy is done, the service releases the pending I/O requests.
9. The service also notifies the writers that they are free to resume writing data as before.
10. The service informs the requester on the location of the shadow copy.
• There are three methods a provider can use to shadow copy:
1. Complete copy – makes a read-only clone of the original volume – this is only done by hardware providers
2. Redirect on Write – leaves the original volume untouched, but creates a new volume and any changes are now redirected to this new volume
3. Copy on Write (COW) – does not make a copy of the original volume, but any changes made after the shadow copy are copied to a “differences area” and when the shadow copy needs to be accessed it can be logically constructed based on the original volume and the changes in the “differences area” – this can be done by both software and hardware providers.
• Paging and other temporary files are automatically excluded from snapshots. The FilesNotToSnapshot registry key can be used to exclude additional files from a shadow copy (it is meant to be used by applications, not end users).
• For the system provider the shadow copy storage area (also called the “differences area” above) must be an NTFS volume. The volume to be shadow copied need not be an NTFS volume.
1. The “differences area” need not necessarily be on the same volume as the one being shadow copied. If the volume already has a “differences area”, that is used. Otherwise a “differences area” can be manually associated with a volume. Else a volume is automatically selected based on available free space (with preference being given to a space on the volume that’s being shadow copied).
2. Although the volume being shadow copied can be non-NTFS, if you are creating persistent shadow copies then it must be NTFS. (Persistent shadow copies exist even after the shadow copy operation is done. Non-persistent shadow copies exist only for the duration of the operation – such as a backup – and are deleted afterwards).
• Maximum number of shadow copied volumes is 64. Maximum number of shadow copies created by the system provider for a volume is 512.
• There are three tools (VSS requesters) you can use to manually make and manage shadow copies.
1. DiskShadow which is present on Windows Server 2008/ Windows Vista and upwards (only on the server versions I think, as I couldn’t find it on my Windows 7 or Windows 8 install)
2. VssAdmin which is present on Windows Server 2003/ Windows XP and upwards (the client versions are different from the server versions I think, the version on my Windows 7 and Windows 8 machines didn’t have an option to create shadows but only list and delete shadows)
3. VShadow which is a part of an SDK package (see this thread for more info) and is better than VssAdmin. Good thing about VShadow is that someone has kindly extracted it from the SDK and made available for download. So if you are on a client version of Windows that’s probably what you will go with.
• The FAQ section on this link has more bits and pieces of use (no point in me regurgitating it here! :)).

Practical Example

I have an USB stick with PST files and such. I can’t take a backup of this USB while Outlook is running as the PST files are locked by it. So what I need to do is use VSS to manually make a shadow copy, then expose said shadow copy as a drive/ path, and use my regular backup tool (robocopy in this case) to backup the files in this shadow copy. Easy peasy? It is!

To create a shadow here’s the command to use:

This, however, creates a non-persistent shadow copy. I want a persistent shadow copy though as I want to access it even after VShadow exits. So I add a switch to create persistent copies.

Here’s a snippet of the output:

Notice how the VSS service determines which component are to be excluded from the shadow and accordingly excludes those Writers? Then it contacts them (not shown) and proceeds to create a shadow. The end result of this operation will be output similar to this:

A shadow copy is a snapshot of a volume at one point of time. Each shadow copy has a GUID. A shadow copy set is a collection of shadow copies of various volumes all taken at the point of time (so it’s a collection of shadow copies of multiple volumes taken at the same point of time). Each shadow copy set has a persistent GUID. These details are shown in the output above, along with the original volume name and others. The shadow copy gets a device name of its which you can mount via mklink (be sure to add a trailing slash to the device name, i.e. \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy16 becomes \\?\GLOBALROOT\Device\HarddiskVolumeShadowCopy16\):

or

Former creates a directory symbolic link, latter creates a junction. Doesn’t make a difference in this case. The directory must not exist beforehand

You can mount persistent shadow copies via their GUID too as below:

This mounts the shadow copy of GUID {dbc75a3a-0a6c-4540-8ba2-c2d3665b3cc1} at the specified drive letter. You don’t specifically need a drive letter. Any empty folder can be used too as a mount point. Even better, you can expose a shadow copy as a shared folder. The command below will expose the previous shadow copy as a shared folder of the specified name.

It’s also possible to expose only a sub-folder of the shadow copy as the shared folder.

Wouldn’t it be cool though if there was a way for the shadow creating command to pass on the GUID and other details so the exposing/ mounting command can easily do that? That would make it very easy in batch files and such, and indeed VShadow has such an option. The command below, which is a modification of the original command, will create a new CMD file which when run populates a set of environment variables that contain the shadow GUID and such.

The contents of the vss-setvar.cmd file will be similar to this (this output is from a different shadow copy hence the GUIDs vary from the output above):

As you can see all it does is set some environment variables containing the snapshot ID and snapshot set ID. What this means is that you could have a batch file with the following commands:

Nice, isn’t it? On the second line when you call vss-setvar.cmd it will set the environment variables for you and then you can use the other commands to expose the shadow copy and finally delete it once its backed up.

It’s also possible to run a command as part of the shadow copy process. As in, this command will run when the shadow copy is created but before Vshadow exists. This is useful when you are dealing with non-persistent shadow copies as the copy will still be present when the command runs. You can run any command or a CMD file, but no parameters can be passed. Here’s how (use the -exec switch):

For instance, here’s what the vss-robocopy.cmd could contain:

Now the robocopy backup happens as soon as the shadow copy finishes but before Vshadow exists. This means I don’t have to use persistent shadow copies like before because when the backup happens the shadow copy is temporarily exposed any way. I find this technique better – feels more neat to me.

While reading for this post I came across a page with more Vshadow examples. Also, here’s someone using PowerShell to mount all shadow copies – good stuff. maybe it will be of use to me later!

Lastly, if you are trying to make a persistent shadow copy and get the following error:

It’s quite likely that the volume you are snapshotting isn’t NTFS formatted. Remember, persistent copies require the volume to be NTFS