Contact

Subscribe via Email

Subscribe via RSS

Categories

Recent Posts

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Misc ESXI/ vSphere stuff

Just some notes to myself so I can refer to this later.

  • You can only have a maximum of 256 VMFS datastores per ESXI host. (This is one reason why you wouldn’t want to create a LUN/ datastore per VM. Wouldn’t work if you have a lot of VMs!)
    • Other maximums (for vSphere 5.5) can be found at this link.
  • When you create distributed switch port group there are 3 port binding options:
    • Static Binding (the default): VM NICs are connected to the port group at VM creation and remain so until the VM is removed from the port group. Power off a VM or disconnecting the NIC from the port group does not remove it from the port group – the port is still kept aside for the VM. What this means is that once you connect a VM to a port it stays with that forever.
      • Since the ports are assigned at VM creation, even if vCenter is down when the VM later powers on/ connects to the port group, it will continue to have network connectivity. (Note the emphasis on “later”. If the VM were already running and vCenter were to go down network traffic isn’t affected in either of the binding options).
    • Dynamic Binding (deprecated): VM NICs are connected to the port group only when the VM is powered on and its NIC connected to the port group. Power off the VM or disconnect the NIC and it is not longer connected to the same port when it comes back on or is reconnected.
      • Since the port binding happens only when the VM is powered on or connected, and the port group resides with vCenter, what this means is that you can only power on / off such VMs via vCenter. If vCenter is off / unreachable when the VM powers on / connects, it will not have network connectivity as it won’t have a port in the port group. (As above, note that this doesn’t affect VMs that are already running).
      • Dynamic Binding is deprecated but is useful when the number of VMs is larger than the number of ports in the port group and not all VMs will be on / connected at the same time.
    • Ephemeral Binding: Similar to Distributed Binding, VM NICs are connected to the port group only when the VM is powered on and its NIC connected to the port group. Powering off the VM or disconnecting it results in the port being removed from the port group. 
      • Although Dynamic and Ephemeral Bindings seem similar, they don’t have similar limitations. Thus while VMs with Dynamic Binding port groups won’t have network connectivity if they are powered on / connected when vCenter is off / unreachable, VMs with Ephemeral Binding have no such limitation. They don’t get a proper port number from the port group, but get a temporary one like h-1 which changes to a proper port number whenever connectivity with vCenter is restored.
      • Below screenshot shows the port numbers of three VMs, each connected to a port group of different binding (Ephemeral, Dynamic, Standard from top to bottom) and powered on when the vCenter was unreachable. Bindings
      • Although the NIC is unable to get a port – like Dynamic Binding – with an Ephemeral Binding port group the host creates a fake port and connects the VM anyway. 
      • I don’t understand why Dynamic Binding even exists as an option – unless it’s for backward compatibility? Ephemeral Binding seems to have the advantage of Dynamic Binding – ports are created at VM connection / powering on and so you can oversubscribe to a port group – but doesn’t have the disadvantage of lost connectivity when vCenter is off / unreachable. (I assume Ephemeral port groups too can be used for over subscribing, though the official KB articles don’t say anything like this so I could be wrong).
      • Dynamically creating / removing ports from the port group is an expensive operation so Dynamic and Ephemeral Binding port groups have a performance overhead. Static Binding is the preferred one.
      • Also, Ephemeral Binding port groups lose their history and security controls across host reboots. Apparently Dynamic Binding port groups don’t do this as I don’t see any mention of this as a Dynamic Binding limitation anywhere.

That’s all for now!

 

Get ESXi host network info using PowerShell/ PowerCLI

Not an exhaustive post, I am still exploring this stuff.

To get a list of network adapters on a host:

To get a list of virtual switches on a host, with the NICs assigned to these:

To get a list of port groups on a host:

To get a list of port groups , the virtual switches they are mapped to, and the NICs that make up these switches:

This essentially combines the first and third cmdlets above.

More later!

ESXi 4.0 “unsupported” mode

At work we still use some ESXi 4.0 hosts so this one’s a reminder to myself as ESXi Shell access works slightly different with that one.

On ESXi 4.0 once we are on the DCUI screen, pressing Alt+F1 gives access to a different (hidden) console. Whatever you type here seems to have no effect, but if you type the word unsupported and press Enter, you will be prompted to enter the root password and enter the Tech Support Mode (TSM). For screenshots and such of this check out this blog post.

On ESXi 4.1 and above you can enable this via the DCUI. See this KB article for the deets.

On that note here’s a good blog post detailing various ways of enabling SSH access on an ESXi host. Informative.

Number of IPv4 routes did not match

Was creating / migrating some ESXi hosts during the week and came across the above error “Number of IPv4 routes did not match” when checking for host profile compliance of one of the hosts. All network settings of this host appeared to be same as the rest so I was stumped as to what could be wrong. Via a VMware KB article I came across the esxcfg-route command that helped identify the problem. To run this command SSH into the host:

By default the command only outputs the default gateway but you can pass it the -l switch to list all routes:

In my case the above output was from one of the hosts, while the following was from the non-compliant host:

Notice the vmk2 interface has the wrong network. Not sure how that happened. Oddly the GUI didn’t show this incorrect network but obviously something was corrupt somewhere.

To fix that I thought I’ll remove the vmk2 interface and re-add it. Big mistake! Possibly because its network was same as that of the management network (10.50.0.0/24) removing this interface caused the host to lose connectivity from vCenter. I could ping it but couldn’t connect to it via SSH, vSphere Client, or vCenter. Finally I had to reset the network via the DCUI – it’s under “Network Restore Options”. I tried “Restore vDS” first, didn’t help, so did a “Restore Standard Switch”. This is a very useful – it creates a new standard switch and moves the Management Network onto that so you get connectivity to the host. This way I was able to reconnect to the host, but now I stumbled upon a new problem.

The host didn’t have the vmk2 interface any more but when I tried to recreate it I got an error that the interface already exists. But no, it does not – the GUI has no trace of it! Some forum posts suggested restarting the vCenter service as that clears its cache and puts it in sync with the hosts but that didn’t help either. Then I came across this post which showed me that it is possible for the host to still have the VMkernel port but vCenter to not know of it. For this the esxcli command is your friend. To list all VMkernel ports on a host do the following:

After that, removing the VMkernel interface can be done by a variant of same command:

Now I could add the re-add the interface via vSphere and get the hosts into compliance.

Before I conclude this post though, a few notes on the commands above.

If you have PowerCLI installed you can run all the esxcli commands via the Get-EsxCli cmdlet. For example:

If I wanted to remove the interface via PowerCLI the command would be slightly different:

I would have written more on the esxcli command itself but this excellent blog post covers it all. It’s an all powerful command that can be used to manage many aspects of the ESXi host, even set it in maintenance mode!

Heck you can even use esxcli to upgrade from one ESXi version to another. It is also possible to run the esxcli command from a remote computer (Windows or Linux) by installing the vSphere CLI tools on that computer. Additionally, there’s also the vSphere Management Assistant (VMA) which is a virtual appliance that offers command line tools.

The esxcli is also useful if you want to kill a VM. For instance the following lists all running VMs on a host:

If that VM were stuck for some reason and cannot be stopped or restarted via vSphere it’s very useful to know the esxcli command can be used to kill the VM (has happened a couple of times to me in the past):

Regarding the type of killing you can do:

There are three types of VM kills that can be attempted: [soft, hard, force]. Users should always attempt ‘soft’ kills first, which will give the VMX process a chance to shutdown cleanly (like kill or kill -SIGTERM). If that does not work move to ‘hard’ kills which will shutdown the process immediately (like kill -9 or kill -SIGKILL). ‘force’ should be used as a last resort attempt to kill the VM. If all three fail then a reboot is required. (required)

Another command line option is vim-cmd which I stumbled upon from one of the links above. I haven’t used it much so as a reference to myself here’s a blog post explaining it in detail.

Lastly there’s also a bunch of esxcfg-* commands, one of whom we came across above.

I haven’t used these much. They seem to be present for compatibility reasons with ESXi 3.x and prior. Back then you had commands with a vicfg- prefix, now you have the same but with a esxcfg- prefix. For instance, esxcfg-vmknic is now replaced with esxcli network interface as we saw above.

That’s all for now!

Update: Thought I’d use this post to keep track of other useful commands.

To get IPv4 addresses details:

Replace with ipv6 if that’s what you want.

To set an IPv4 address:

To ping an address from the host:

Change keyboard layout:

Get current keyboard layout:

List available layouts:

Set a new layout:

Remotely enable SSH

The esxcli commands are cool but you need to enable SSH each time you want to connect to the host and run these (unless you install the CLI tools on your machine). If you have PowerCLI though you can enable SSH remotely.

To list the services:

To enable SSH and the ESXi shell:

 

VMware: “A specified parameter was not correct” error

Was trying to delete a VM template but it kept throwing the above error. I had a feeling this was because the underlying disk was missing in the datastore (because I couldn’t find any folder with the same name as the VM in the datastore) but there was no way to confirm this as you can’t right click a VM and note its settings.

Thanks to PowerCLI though, you can:

The Get-HardDisk cmdlet can be used to return the hard disks used by a VM or template. It can even be used to return all hard disks on a datastore (or in a specified path on the datastore):

 

PowerCLI – List all VMs in a cluster along with number of snapshots and space usage

More as a note to myself than anyone else, here’s a quick and dirty way to list all the VMs in a cluster with the number of snapshots, the used space, and the provisioned space. Yes you could get this information from the GUI but I like PowerShell and am trying to spend more time with PowerCLI.

 

PowerShell – Create a list of all Exchange mailboxes in an OU with mailbox size, Inbox size, etc

As the title says, here’s a one-liner to quickly create a list of all Exchange mailboxes in an OU with mailbox size, Inbox size, Sent Items size, and the number of items in each of these folders.

 

FYI: Self Encrypting Drives must be uninitialized for BitLocker Hardware encryption

Got myself a new 1TB Crucial MX200 SSD today. This is a Self Encrypting Drive like my other SSDs. When I tried enabling BitLocker on it as I usually do, I noticed that it was asking me about how to encrypt the drive and taking more time with the encryption than I have seen in the past with SED drives that support the TCG OPAL standard. 

Not good if you get this screen!

Not good if you get this screen!

Something was not right. So I went back to Microsoft’s page on BitLocker and SEDs and noticed that one of the requirements was that the drive must be uninitialized! Damn! In the past I usually enable encryption and then copy over data, but today I had copied the data first (thus initializing the drive and creating partitions) and then I was trying toe enable encryption. Obliviously that was a no-go so I had to copy the data out of the drive, uninitialize it, and then turn on BitLocker encryption. 

Uninitializing is easy via diskpart

Now Disk Management will show the disk as uninitialized. 

uninit

Create partitions as usual but before writing any data to the disk turn on BitLocker encryption. This time it will be a one-second operation and you won’t get a prompt like above. 

To confirm that the drive is hardware encrypted (in case you wonder whether BitLocker didn’t just zip through coz the drive had no data on it) use the manage-bde command:

As you can see the drive is hardware encrypted. 

Load balancing in vCenter and ESXI

One of the things you can do with a portgroup is define teaming for the underlying physical NICs.

teaming

If you don’t do anything here, the default setting of “Route based on originating virtual port” applies. What this does is quite obvious. Each virtual port on the virtual switch is mapped to a physical NIC behind the scenes; so all traffic to & from that virtual port goes & comes via that physical NIC. Since your virtual NIC connects to a virtual port this is equivalent to saying all traffic for that virtual NIC happens via a particular physical NIC.

In the screenshot above, for instance, I have two physical NICs dvUplink1 and dvUplink2. If I left teaming at the default setting and say I had 4 VMs connecting to 4 virtual ports, chances are two of these VMs will use dvUplink1 and two will use dvUplink2. They will continue using these mappings until one of the dvUplinks dies, in which case the other will take over – so that’s how you get failover.

This is pretty straightforward and easy to set up. And the only disadvantage, if any, is that you are limited to the bandwidth of a single physical NIC. If each of dvUplink1 & dvUplink2 were 1Gb NICs it isn’t as though the underlying VMs had 2Gb (2 NICs x 1Gb each) available to them. Since each VM is mapped to one uplink, 1Gb is all they get.

Moreover, if say two VMs were mapped to an uplink, and one of them was hogging up all the bandwidth of this uplink while the remaining uplink was relatively free, the other VM on this uplink won’t automatically be mapped to the free uplink to make better use of resources. So that’s a bummer too.

A neat thing about “Route based on originating virtual port” is that the virtual port is fixed for the lifetime of the virtual machine so the host doesn’t have to calculate which physical NIC to use each time it receives traffic to & from the virtual machine. Only if the virtual machine is powered off, deleted, or moved to a different host does it get a new virtual port.

The other options are:

  • Route based on MAC hash
  • Route based on IP hash
  • Route based on physical NIC load
  • Explicit failover

We’ll ignore the last one for now – that just tells the host to use the first physical NIC in the list and use that for all VMs.

“Route based on MAC hash” is similar to “Route based on originating virtual port” in that it uses the MAC address of the virtual NIC instead of virtual port. I am not very clear on how this is better than the latter. Since the MAC address of a virtual machine is usually constant (unless it is changed or a different virtual NIC used) all traffic from that MAC address will use the same physical NIC always. Moreover, there is the additional overhead in that the host has to check each packet for the MAC address and decide which physical NIC to use. VMware documentation says it provides a more even distribution of traffic but I am not clear how.

“Route based on physical NIC load” a good one. It starts off with “Route based on originating virtual port” but if a physical NIC is loaded, then the virtual ports mapped to it are moved to a physical NIC with less load! This load balancing option is only available for distributed switches. Every 30s the distributed switch checks the physical NIC load and if it exceeds 75% then the virtual port of the VM with highest utilization is moved to a different physical NIC. So you have the advantages of “Route based on originating virtual port” with one of its major disadvantages removed.

In fact, except for “Route based on IP hash” none of the other load balancing mechanisms have an option to utilize more than a single physical NIC bandwidth. And “Route based on IP hash” does not do this entirely as you would expect.

“Route based on IP hash”, as the name suggests, does load balancing based on the IP hash of the virtual machine and the remote end it is communicating with. Based on a hash of these two IP addresses all traffic for the communication between these two IPs is sent through one NIC. So if a virtual machine is communicating with two remote servers, it is quite likely that traffic to one server goes through one physical NIC while traffic to the other goes via another physical NIC – thus allowing the virtual machine to use more bandwidth than that of one physical NIC. However – and this is an often overlooked point – all traffic between the virtual server and one remote server is still constrained by the bandwidth of the physical NIC it happens via. Once traffic is mapped to a particular physical NIC, if more bandwidth is required or the physical NIC is loaded, it is not as though an additional physical NIC is used. This is a catch with “Route based on IP hash” that’s worth remembering.

If you select “Route based on IP hash” as a load balancing option you get two warnings:

  • With IP hash load balancing policy, all physical switch ports connected to the active uplinks must be in link aggregation mode.
  • IP hash load balancing should be set for all port groups using the same set of uplinks.

What this means is that unlike the other load balancing schemes where there was no additional configuration required on the physical NICs or the switch(es) they connect to, with “Route based on IP hash” we must combine/ bond/ aggregate the physical NICs as one. There’s a reason for this.

In all the other load balancing options the virtual NIC MAC is associated with one physical NIC (and hence one physical port on the physical switch). So incoming traffic for a VM knows which physical port/ physical NIC to go via. But with “Route based on IP hash” there is no such one to one mapping. This causes havoc with the physical switch. Here’s what happens:

  • Different outgoing traffic flows choose different physical NICs. With each of these packets the physical switch will keep updating its MAC address table with the port the packet was got from. So for instance, say the two physical NICs are connected to physical switch Port1 and Port2 and the virtual NIC MAC address is VMAC1. When an outgoing traffic packet goes via the first physical NIC, the switch will update its tables to reflect that VMAC1 is connected to Port1. Subsequent traffic flows might continue using the first physical NIC so all is well. Then say a traffic flow uses the second physical NIC. Now the switch will map VMAC1 to Port2; then a traffic flow could use Port1 so the mapping gets changed to Port1, and then Port2, and so on …
  • When incoming traffic hits the physical switch for MAC address VMAC1, the switch will look up its tables and decide which port to send traffic on. If the current mapping is Port1 traffic will go out via that; if the current mapping is Port2 traffic will go out via that. The important thing to note is that the incoming traffic flow port chosen is not based on the IP hash mapping – it is purely based on whatever physical port the switch currently has mapped for VMAC1.
  • So what’s required is a way of telling the physical switch that the two physical NICs are to be considered as bonded/ aggregated such that traffic from either of those NICs/ ports is to be treated accordingly. And that’s what EtherChannel does. It tells the physical switch that the two ports/ physical NICs are bonded and that it must route incoming traffic to these ports based on an IP hash (which we must tell EtherChannel to use while configuring it).
  • EtherChannel also helps with the MAC address table in that now there can be multiple ports mapped to the same MAC address. Thus in the above example there would now be two mappings VMAC1-Port1 and VMAC1-Port2 instead of them over-writing each other!

“Route based on IP hash” is a complicated load balancing option to implement because of EtherChannel. And as I mentioned above, while it does allow a virtual machine to use more bandwidth than a single physical NIC, an individual traffic flow is still limited to the bandwidth of a single physical NIC. Moreover there is more overhead on the host because it has to calculate the physical NIC used for each traffic flow (essentially each packet).

Prior to vCenter 5.1 only static EtherChannel was supported (unless you use a third party virtual switch such as the Cisco Nexus 1000V). Static EtherChannel means you explicitly bond the physical NICs. But from vCenter 5.1 onwards the inbuilt distributed switch supports LACP (Link Aggregation Control Protocol) which is a way of automatically bonding physical NICs. Enable LACP on both the physical switch and distributed switch and the physical NICs will automatically be bonded.

(To enable LACP on the physical NICs go to the uplink portgroup that these physical NICs are connected to and enable LACP).

lacpThat’s it for now!

Update

Came across this blog post which covers pretty much everything I covered above but in much greater detail. A must read!

VCSA: Unable to connect to server. Please try again.

Most likely you set the VCSA to regenerate its certificates upon reboot and forgot to uncheck it after the reboot. (It’s under Admin > Certificate Regeneration Enabled). So each time you reboot VCSA gets a new certificate and your browser throws the above error.

Fix is to refresh (Ctrl+F5 in Firefox) the page so the new certificate is fetched and you get a prompt about it.

A very brief intro to Port Groups, Standard and Distributed switches

A year ago I went for VMware training but never got around to using it at work. Now I am finally into it, but I’ve forgotten most of the concepts. And that sucks!

So I am slowly re-learning things as I go along. I am in this weird state where I sort of remember bits and pieces from last year but at the same time I don’t really remember them.

What I have been reading about these past few days (or rather, trying to read these past few days) is networking. The end goal is distributed switches but for now I am starting with the basics. And since I like to blog these things as I go along, here we go.

You have a host. The server that runs ESXi (hypervisor).

This host has physical NICs. Hopefully oodles of them, all connected to your network.

This server runs virtual machines (a.k.a guests). These guests see virtual NICs that don’t really exist except in software, exposed by ESXi.

What you need is for all these virtual NICs to be able to talk to each other (if needed) as well as talk to the outside world (via the physical NICs and they switches they connect to).

You could create one big virtual switch and connect all the physical and virtual NICs to it. (This virtual switch is again something which does not physically exist). All the guests can thus talk to each other (as they are on the same switch) and also talk to the outside world (because the virtual switch is connected to the outside world via whatever it is connected to).

But maybe you don’t want all the virtual NICs to be able to talk to each other. You want little networks in there – a la VLANs – to isolate certain traffic from other. There’s two options here:

  1. Create separate virtual switches for each network, and assign some virtual NICs to some switches. The physical NICs that connect to these virtual switches will connect to separate physical switches so you are really limited in the number of virtual switches you have by the number of physical NICs you have. Got 2 physical NICs, you can create 2 virtual switches; got 5 physical NICs, you can create 5 virtual switches.
  2. Create one big virtual switch as before, but use port groups. Port groups are the VMware equivalent of VLANs (well, sort of; they do more than just VLANs). They are a way of grouping the virtual ports on the virtual switch such that only the NICs connected to a particular port group can talk to each other. You can create as many port groups as you want (within limits) and assign all your physical NICs to this virtual switch and use VLANs so the traffic flowing out of this virtual switch to the physical switch is on separate networks. Pretty nice stuff!

(In practice, even if you create separate virtual switches you’d still create a port group on that – essentially grouping all the ports on that switch into one. That’s because port groups are used to also apply policies to the ports in the group. Policies such as security, traffic shaping, and load balancing/ NIC teaming of the underlying physical NICs. Below is a screenshot of the options you have with portgroups).

Example of a Portgroup

Now onto standard and distributed switches. In a way both are similar – in that they are both virtual switches – but the difference is that a standard switch exists on & is managed by a host whereas a distributed switch exists on & is managed by vCenter. You create a distributed switch using vCenter and then you go to each host and add its physical NICs to the distributed switch. As with standard switches you create can portgroups in distributed switches and assign VM virtual NICs to these portgroups.

An interesting thing when it comes to migration (obvious but I wasn’t sure about this initially) is that if you have a host with two NICs – one of which is a member of a standard switch and the other of a distributed switch – but both NICs connect to the same physical network (or VLAN), and you have VMs in this host some of which are on the standard switch and others are on the distributed switch, all these VMs can talk to each other through the underlying physical network. Useful when you want to migrate stuff.

I got side tracked at this point with other topics so I’ll conclude this post here for now.

Adding DHCP scope options via PowerShell

Our deployment team needed a few DHCP options set for all our scopes. There was a brickload of these scopes, no way I was going to go to each one of them and right-click add the options! I figured this was one for PowerShell!

Yes, I ended up taking longer with PowerShell coz I didn’t know the DHCP cmdlets but hey (1) now I know! and (2) next time I got to do this I can get it done way faster. And once I get this blog post written I can refer back to it that time.

The team wanted four options set:

  • Predefined Option 43 – 010400000000FF
  • Custom Option 60 – String – PXEClient
  • Predefined Option 66 – 10.x.x.x
  • Predefined Option 67 – boot\x86\wdsnbp.com

PowerShell 4 (included in Windows 8.1 and Server 2012 R2) has a DHCP module providing a bunch of DHCP cmdlets.

First things first – I needed to filter out the scopes I had to target. Get-DhcpServerv4Scope is your friend for that. It returns an array of scope objects – filter these the usual way. For example:

Now, notice that one of the options to be added is a custom one. Meaning it doesn’t exist by default. Via GUI you would add it by right clicking on “IPv4″ and selecting “Set Predefined Options” then adding the option definition. But I am doing the whole thing via PowerShell so here’s what I did:

To add an option the Set-DhcpServerv4OptionValue is your friend. For example:

I had a bit of trouble with option 43 because it has a vendor defined format and I couldn’t input the value as given. From the help pages though I learnt that I have to give it in chunks of hex. Like thus:

Wrapping it all up, here’s what I did (once I added the new definition):

And that’s it!

Install VMware tools is grayed out in Workstation

Came across this problem today and couldn’t find any Google hits that helped me. Finally hit upon a solution.

VMware tools requires the guest to have a CD drive. In my case the physical host doesn’t have a CD drive, and I had no need to mount any CDs in the guest, so while creating the guest I removed the CD drive. No CD drive => no place for VMware to insert the CD. But rather than complain about it, it simply grays out the option.

So that’s it. Enable the CD drive and you will be able to install VMware tools! Hope this helps.

Get a list of recently installed Windows updates via the command line

In a previous post I gave a DISM command to get a list of installed Windows Updates:

While useful that command has no option of filtering results based on some criteria. 

If you are on Windows 8 or above the Get-WindowsPackage cmdlet can be of use:

This gets me all updates installed in the last 15 days. 

Another alternative (on pre-Windows 8 machines) is good ol’ WMIC:

The above gives output similar to this:

For more details more switches can be used:

Result is:

This output also gives an idea of the criteria available. 

So how can I filter this output like I did with PowerShell? Easy – use WQL (WMIC Query Language). Inspired by a blog post I found (which I am sure I have referred to in the past too) either of the following will do the trick:

-or- 

And if you want to format the output with specific fields:

Which results in something along these lines:

This includes Updates, Hotfixes, and Security Updates. If you want to filter down further, that too is possible (just mentioning these as a reference to my future self). Do a specific match:

Or a wildcard:

Or a negation:

These two links (WQL and WHERE clauses) were of use in picking up the WQL syntax. They are not very explanatory but you get an idea by trial and error. Once I had picked up the syntax I came across this about_WQL page that’s part of the PowerShell documentation and explains WQL operators. Linking to it here as a reference to myself and others. 

Unlike PowerShell I don’t know how to make WMIC use a greater than operator and simply specify the date. I tried something like this (updates installed after 12th May 2015):

But the results include some updates from 2013 & 2014 too. Not sure what’s wrong and I am not in the mood to troubleshoot at the moment. The like operator does the trick well for me currently. 

3Par Course: Day 1

Got a 3Par training going on at work since today. I have no experience with 3Pars so this is my first encounter with it really. I know a bit about storage thanks to having worked with HP LeftHands (now called HP StoreVirtual) and Windows Server 2012 Storage Spaces – but 3Par is a whole another beast!

Will post more notes on today’s material tomorrow (hopefully!) but here’s some bits and pieces before I go to sleep:

  • You have disks.
  • These disks are in magazines (in the 10000 series) – up to 4 per magazines if I remember correctly. 
    • The magazines are then put into cages. 10 magazines per cage? 
  • Disks can also be in cages directly (in the non-10000 series, such as the 7200 and 7400 series). 
    • Note: I could be getting the model numbers and figures wrong so take these with a pinch of salt!
  • Important thing to remember is you have disks and these disks are in cages (either directly or as part of magazines). 
  • Let’s focus on the 7200/ 7400 series for now.
    • A cage is a 2U enclosure. 
    • In the front you have the disks (directly, no magazines here). 
    • In the rear you have two nodes. 
      • Yes, two nodes! 3Pars always work in pairs. So you need two nodes. 
      • 7200 series can only have two nodes max/ min. 
      • 7400 series can have two nodes min, four nodes max. 
      • So it’s better to get a 7400 series even if you only want two nodes now as you can upgrade later on. With a 7200 series you are stuck with what you get. 
      • 7200 series is still in the market coz it’s a bit cheaper. That’s coz it also has lower specs (coz it’s never going to do as much as a 7400 series). 
    • What else? Oh yeah, drive shelves. (Not sure if I am getting the term correct here). 
    • Drive shelves are simply cages with only drives in them. No nodes!
    • There are limits on how many shelves a node can control.
      • 7200 series has the lowest limit. 
      • Followed by a two node 7400 series.
      • Followed by a four node 7400 series. This dude has the most!
        • A four node 7400 is 4Us of nodes (on the rear side).
        • The rest of the 42U (rack size) minus 4U (node+disks) = 38U is all drive shelves!
      • Number of drives in the shelf varies if I remember correctly. As in you can have larger size drives (physical size and storage) so there’s less per shelf. 
      • Or you could have more drives but smaller size/ lower storage. 
        • Sorry I don’t have clear numbers here! Most of this is from memory. Must get the slides and plug in more details later. 
    • Speaking of nodes, these are the brains behind the operation.
      • A node contains an ASIC (Application Specific Integrated Circuit). Basically a chip that’s designed for a specific task. Cisco routers have ASICs. Brocade switches have ASICs. Many things have ASICs in them. 
      • A node contains a regular CPU – for management tasks – and also an ASIC. The ASIC does all the 3Par stuff. Like deduplication, handing metadata, optimizing traffic and writes (it skips zeroes when writing/ sending data – is a big deal). 
      • The ASIC and one more thing (TPxx – 3Par xx are the two 3Par innovations). Plus the fact that everything’s built for storage, unlike a LeftHand which is just a Proliant Server. 
      • Caching is a big deal with 3Pars. 
        • You have write caching. Which means whenever the 3Par is given a blob of data, the node that receives it (1) stores it in its cache, (2) sends to its partner, and (3) tells whoever gave it the blob that the data is now written. Note that in reality the data is only in the cache; but since both nodes have it now in their cache it can be considered somewhat safe, and so rather than wait for the disks to write data and reply with a success, the node assures whoever gave it the data that the data is written to disk.
          • Write caching obviously improves performance tremendously! So it’s a big deal. 7400 series have larger caches than 7200.
          • This also means if one node in a two-pair is down, caching won’t happen – coz now the remaining node can’t simply lie that the data is written. What happens if it too fails? There is no other node with a cached copy. So before the node replies with a confirmation it must actually write the data to disk. Hence if one node fails performance is affected.
        • And then you have read caching. When data is read additional data around it is read-ahead.  This improves performance if this additional data is required next. (If required then it’s a cache hit, else a cache miss). 
        • Caching is also involved in de-duplication. 
          • Oh, de-duplication only happens for SSDs. 
            • And it doesn’t happen if you want Adaptive Optimization (whatever that is). 
      • There is remote copying – which is like SRM for VMware. You can have data being replicated from one 3Par system to another. And this can happen between the various models.
      • Speaking of which, all 3Par models have the same OS etc. So that’s why you can do copying and such easily. And manage via a common UI. 
        • There’s a GUI. And a command line interface (CLI). The CLI commands are also available in a regular command prompt. And there’s PowerShell cmdlets now in beta testing?
  • Data is stored in chunklets. These are 1GB units. Spread across disks
  • There’s something called a CPG (Common Provisioning Group). This is something like a template or a definition. 
    • You define a CPG that says (for instance) you want RAID1 (mirroring) with 4 sets (i.e. 4 copies of the data) making use of a particular set of physical disks.
  • Then you create Logical Disks (LDs) based on a CPG. You can have multiple logical disks – based on the same or different CPGs. 
  • Finally you create volumes on these Logical Disks. These are what you export to hosts via iSCSI & friends. 
  • We are not done yet! :) There’s also regions. These are 128 MB blocks of data. 8 x 128 MB = 1024 MB (1 GB) = size of a chunklet. So a chunklet has 8 regions.
  • A region is per disk. So when I said above that a chunklet is spead across disks, what I really meant is that a chunklet is made up of 8 regions and each region is on a particular disk. That way a chunklet is spread across multiple disks. (Tada!)
    • A chunklet is to a physical disk what a region is to a logical disk. 
    • So while a chunklet is 1 GB and doesn’t mean much else, a region has properties. If a logical disk is RAID1, the region has a twin with a copy of the same data. (Or does it? I am not sure really!) 
  • And lastly … we have steps! This is 128 KB (by default for a certain class of disks, it varies for different classes, exact number doesn’t matter much). 
    • Here’s what happens: when the 3Par gets some data to be written, it writes 128 KB (a step size) to one region, then 128 KB to another region, and so on. This way the data is spread across many regions.
    • And somehow that ties in to chunklets and how 3Par is so cool and everything is redundant and super performant etc. Now that I think back I am not really sure what the step does. It was clear to me after I asked the instructor many questions and he explained it well – but now I forget it already! Sigh. 

Some day soon I will update this post or write a new one that explains all these better. But for now this will have to suffice. Baby steps!