Subscribe via Email

Subscribe via RSS


Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

P2V a SQL cluster by breaking the cluster

Need to P2V a SQL cluster at work. Here’s screenshots of what I did in a test environment to see if an idea of mine would work.

We have a 2 physical-nodes SQL cluster. The requirement was to convert this into a single virtual machine.

P2V-ing a single server is easy. Use VMware Converter. But P2V-ing a cluster like this is tricky. You could P2V each node and end up with a cluster of 2 virtual-nodes but that wasn’t what we wanted. We didn’t want to deal with RDMs and such for the cluster, so we wanted to get rid of the cluster itself. VMware can provide HA if anything happens to the single node.

My idea was to break the cluster and get one of the nodes of the cluster to assume the identity of the cluster. Have SQL running off that. Virtualize this single node. And since there’s no change as far as the outside world is concerned no one’s the wiser.

Found a blog post that pretty much does what I had in mind. Found one more which was useful but didn’t really pertain to my situation. Have a look at the latter post if your DTC is on the Quorum drive (wasn’t so in my case).

So here we go.

1) Make the node that I want to retain as the active node of the cluster (so it was all the disks and databases). Then shutdown SQL server.


2) Shutdown the cluster.


3) Remove the node we want to retain, from the cluster.

We can’t remove/ evict the node via GUI as the cluster is offline. Nor can we remove the Failover Cluster feature from the node as it is still part of a cluster (even though the cluster is shutdown). So we need to do a bit or “surgery”. :)

Open PowerShell and do the following:

This simply clears any cluster related configuration from the node. It is meant to be used on evicted nodes.

Once that’s done remove the Failover Cluster feature and reboot the node. If you want to do this via PowerShell:

4) Bring online the previously shared disks.

Once the node is up and running, open Disk Management and mark as online the shared disks that were previously part of the cluster.


5) Change the IP and name of this node to that of the cluster.

Straight-forward. Add CNAME entries in DNS if required. Also, you will have to remove the cluster computer object from AD first before renaming this node to that name.

6) Make some registry changes.

The SQL Server is still not running as it expects to be on a cluster. So make some registry changes.

First go to HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\Setup and open the entry called SQLCluster and change its value from 1 to 0.

Then take a backup (just in case; we don’t really need it) of the key called HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\Cluster and delete it.

Note that MSSQL10_50.MSSQLSERVER may vary depending on whether you have a different version of SQL than in my case.

7) Start the SQL services and change their startup type to Automatic.

I had 3 services.

Now your SQL server should be working.

8) Restart the server – not needed, but I did so anyways.


If you are doing this in a test environment (like I was) and don’t have any SQL applications to test with, do the following.

Right click the desktop on any computer (or the SQL server computer itself) and create a new text file. Then rename that to blah.udl. The name doesn’t matter as long as the extension is .udl. Double click on that to get a window like this:


Now you can fill in the SQL server name and test it.

One thing to keep in mind (if you are not a SQL person – I am not). The Windows NT Integrated security is what you need to use if you want to authenticate against the server with an AD account. It is tempting to select the “Use a specific user name …” option and put in an AD username/ password there, but that won’t work. That option is for using SQL authentication.

If you want to use a different AD account you will have to do a run as of the tool.

Also, on a fresh install of SQL server SQL authentication is disabled by default. You can create SQL accounts but authentication will fail. To enable SQL authentication right click on the server in SQL Server Management Studio and go to Properties, then go to Security and enable SQL authentication.


That’s all!

Now one can P2V this node.

Notes on vSphere High Availability (HA)

Just some notes on vSphere HA as I reading along on that. Nothing new here …

Starting with vSphere 5.0 HA has a Master/ Slave model. One ESXi host is elected as a Master, the rest are Slaves. The Master is the one with the most number of datastores connected to it; if all ESXi hosts have the same number of datastores connected to it, the Master is the one with the largest Managed Object ID (MOID). Note that the MOID is interpreted lexically – so an MOID 99 is larger than 100. PowerCLI can be used to view the MOIDs:

Also, the MOID is a vCenter specific construct. Whenever a host, VM, datastore, etc is added to vCenter it is assigned an MOID. For instance here are the MOIDs of my datastores:

Although I haven’t used this it’s also possible to find MOIDs vSphere Managed Object Browser. See this KB article for more info.

Back to the topic – the above is how a Master is elected. There’s only one Master per cluster. When it comes to HA, the Fault Domain Manager (FDM) on this Master is responsible for most of the tasks (which is why even if vCenter is down for a while HA can continue working). vCenter checks with the Master and the Master communicates with vCenter to keep each other abreast of the cluster situation.

  • FDM is installed at /opt/vmware/fdm/fdm/
  • FDM config files are at /etc/opt/vmware/fdm/

The Master monitors the Slave hosts and if a Slave goes down/ is unreachable the Master is responsible for starting these Protected VMs elsewhere. The Master is also responsible for keeping the Slaves abreast of the cluster configuration.

Slaves are limited to monitoring VMs running with them. Slaves monitor the VM health and if a Protected VM powers down they inform the Master so it can be restarted. (Note on Protected VMs: once you enable VM monitoring on a cluster or set a VM as Protected, the VM must be powered off and powered on to be protected). Slaves also keep in touch with each other and if they find the Master is down they conduct an election to select a new Master.

The only time vCenter communicates with Slaves is when a new Master needs to be elected or when the Master reports a Slave as missing and so vCenter tries to contact it.

Slaves send network heartbeats to the Master every second. When a Master stops receiving heartbeats from a Slave it knows it is offline or partitioned/ isolated. Similarly when a Slave stops receiving heartbeats from a Master it knows the Master is offline or partitioned/ isolated.

  • If a Slave is cut off from all other hosts (Master and Slaves) it is considered isolated (caveat: you can also specify up to 10 isolation IP addresses to ping – if these are reachable but the Master and Slaves are not, the Slave does not consider itself isolated, only partitioned).
  • If a Slave is cut off from the Master and some/ none Slaves (i.e. it still has contact with some Slaves) then it is considered partitioned.

In the past if a Slave were isolated/ partitioned the Master would consider it as offline and restart its Protected VMs elsewhere. Starting with vSphere 5.0 the Master also sends a ping (ICMP packet) to the Slave to see if responds and uses datastore heartbeats to verify the Slave is really down. It could be that the Management network is down but the VM and storage networks are up, so the VMs are still functioning as expected.

Datastore heartbeats work thus (and remember they are only used in case of isolation/ partition scenarios):

  • When enabling HA for a cluster, a datastore is automatically selected (or can be selected manually by the user) to be used for datastore heartbeats.
  • On this datastore a folder called .vSphere-HA is created within which a sub-folder of name FDM-<Fault Domain ID>-<vCenter Server Name> is created. (Such a name allows the same datastore to be used by multiple clusters).
  • Each host creates a file with its MOID name in this sub-folder. Like thus:heartbeats
  • Notice the host-X-hb file above? That is created by each host (you can check the /var/log/fdm.log file on each host to see it creating this file). When a Slave does not get heartbeats from a Master it updates its file above (and also checks the timestamp of the file for the Master – if that has updates it means the Master is alive). Similarly, when a Master does not hear from a Slave it checks the Slave’s file above to see if there’s updates. This is how datastore heartbeats work.
  • If a Slave is network partitioned – i.e. it cannot contact the Master – but can see some of the other Slaves, the Master and Slave can conclude that each other is still alive from the datastore heartbeats as above.
    • If the Master is down – i.e. the Slaves think they are partitioned because actually the Master is down – they can now elect a new Master since there are no datastore heartbeats from the Master.
    • If the Slave is down – i.e. the Master is not getting any datastore heartbeats from the Slave – then it restarts the Protected VMs on other hosts. (If the Slave were actually up but had lost network access to the datastore and so cannot update heartbeats, it is as good as down because the VMs have probably crashed by now).
  • If a Slave is network isolated – i.e. it cannot contact the Master or any other Slave (nor can it ping the isolation addresses) – then the Slave adds a special bit in the host-X-poweron file above. This tells the Master that the Slave is network isolated.
    • The Master then locks the file called protectedlist. This is a list of all Protected VMs. Once the Master has locked this file, the Slave knows the Master has taken responsibility for the Protected VMs and the Slave can leave these powered on, shut down, or power off (depending on which of these is selected as the host isolation response when setting up HA).
    • The protectedlist file thus ensures that unless another host has taken over these VMs the current host will not shut down/ power off these.

Two advanced options to keep in mind:

  • I mentioned this earlier: das.isolationAddress[0-9] allow one to specify up to 10 isolation IP addresses to check before a host considers itself isolated.
  • And das.allowNetwork[0-9] allow one to specify up to 10 port groups to use for HA. See this KB article for examples.

Lastly, I haven’t read it fully but this HA Deepdive is a great resource.