Contact

Subscribe via Email

Subscribe via RSS

Categories

Recent Posts

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

VMware: “A specified parameter was not correct” error

Was trying to delete a VM template but it kept throwing the above error. I had a feeling this was because the underlying disk was missing in the datastore (because I couldn’t find any folder with the same name as the VM in the datastore) but there was no way to confirm this as you can’t right click a VM and note its settings.

Thanks to PowerCLI though, you can:

The Get-HardDisk cmdlet can be used to return the hard disks used by a VM or template. It can even be used to return all hard disks on a datastore (or in a specified path on the datastore):

 

PowerCLI – List all VMs in a cluster along with number of snapshots and space usage

More as a note to myself than anyone else, here’s a quick and dirty way to list all the VMs in a cluster with the number of snapshots, the used space, and the provisioned space. Yes you could get this information from the GUI but I like PowerShell and am trying to spend more time with PowerCLI.

 

PowerShell – Create a list of all Exchange mailboxes in an OU with mailbox size, Inbox size, etc

As the title says, here’s a one-liner to quickly create a list of all Exchange mailboxes in an OU with mailbox size, Inbox size, Sent Items size, and the number of items in each of these folders.

 

FYI: Self Encrypting Drives must be uninitialized for BitLocker Hardware encryption

Got myself a new 1TB Crucial MX200 SSD today. This is a Self Encrypting Drive like my other SSDs. When I tried enabling BitLocker on it as I usually do, I noticed that it was asking me about how to encrypt the drive and taking more time with the encryption than I have seen in the past with SED drives that support the TCG OPAL standard. 

Not good if you get this screen!

Not good if you get this screen!

Something was not right. So I went back to Microsoft’s page on BitLocker and SEDs and noticed that one of the requirements was that the drive must be uninitialized! Damn! In the past I usually enable encryption and then copy over data, but today I had copied the data first (thus initializing the drive and creating partitions) and then I was trying toe enable encryption. Obliviously that was a no-go so I had to copy the data out of the drive, uninitialize it, and then turn on BitLocker encryption. 

Uninitializing is easy via diskpart

Now Disk Management will show the disk as uninitialized. 

uninit

Create partitions as usual but before writing any data to the disk turn on BitLocker encryption. This time it will be a one-second operation and you won’t get a prompt like above. 

To confirm that the drive is hardware encrypted (in case you wonder whether BitLocker didn’t just zip through coz the drive had no data on it) use the manage-bde command:

As you can see the drive is hardware encrypted. 

Load balancing in vCenter and ESXI

One of the things you can do with a portgroup is define teaming for the underlying physical NICs.

teaming

If you don’t do anything here, the default setting of “Route based on originating virtual port” applies. What this does is quite obvious. Each virtual port on the virtual switch is mapped to a physical NIC behind the scenes; so all traffic to & from that virtual port goes & comes via that physical NIC. Since your virtual NIC connects to a virtual port this is equivalent to saying all traffic for that virtual NIC happens via a particular physical NIC.

In the screenshot above, for instance, I have two physical NICs dvUplink1 and dvUplink2. If I left teaming at the default setting and say I had 4 VMs connecting to 4 virtual ports, chances are two of these VMs will use dvUplink1 and two will use dvUplink2. They will continue using these mappings until one of the dvUplinks dies, in which case the other will take over – so that’s how you get failover.

This is pretty straightforward and easy to set up. And the only disadvantage, if any, is that you are limited to the bandwidth of a single physical NIC. If each of dvUplink1 & dvUplink2 were 1Gb NICs it isn’t as though the underlying VMs had 2Gb (2 NICs x 1Gb each) available to them. Since each VM is mapped to one uplink, 1Gb is all they get.

Moreover, if say two VMs were mapped to an uplink, and one of them was hogging up all the bandwidth of this uplink while the remaining uplink was relatively free, the other VM on this uplink won’t automatically be mapped to the free uplink to make better use of resources. So that’s a bummer too.

A neat thing about “Route based on originating virtual port” is that the virtual port is fixed for the lifetime of the virtual machine so the host doesn’t have to calculate which physical NIC to use each time it receives traffic to & from the virtual machine. Only if the virtual machine is powered off, deleted, or moved to a different host does it get a new virtual port.

The other options are:

  • Route based on MAC hash
  • Route based on IP hash
  • Route based on physical NIC load
  • Explicit failover

We’ll ignore the last one for now – that just tells the host to use the first physical NIC in the list and use that for all VMs.

“Route based on MAC hash” is similar to “Route based on originating virtual port” in that it uses the MAC address of the virtual NIC instead of virtual port. I am not very clear on how this is better than the latter. Since the MAC address of a virtual machine is usually constant (unless it is changed or a different virtual NIC used) all traffic from that MAC address will use the same physical NIC always. Moreover, there is the additional overhead in that the host has to check each packet for the MAC address and decide which physical NIC to use. VMware documentation says it provides a more even distribution of traffic but I am not clear how.

“Route based on physical NIC load” a good one. It starts off with “Route based on originating virtual port” but if a physical NIC is loaded, then the virtual ports mapped to it are moved to a physical NIC with less load! This load balancing option is only available for distributed switches. Every 30s the distributed switch checks the physical NIC load and if it exceeds 75% then the virtual port of the VM with highest utilization is moved to a different physical NIC. So you have the advantages of “Route based on originating virtual port” with one of its major disadvantages removed.

In fact, except for “Route based on IP hash” none of the other load balancing mechanisms have an option to utilize more than a single physical NIC bandwidth. And “Route based on IP hash” does not do this entirely as you would expect.

“Route based on IP hash”, as the name suggests, does load balancing based on the IP hash of the virtual machine and the remote end it is communicating with. Based on a hash of these two IP addresses all traffic for the communication between these two IPs is sent through one NIC. So if a virtual machine is communicating with two remote servers, it is quite likely that traffic to one server goes through one physical NIC while traffic to the other goes via another physical NIC – thus allowing the virtual machine to use more bandwidth than that of one physical NIC. However – and this is an often overlooked point – all traffic between the virtual server and one remote server is still constrained by the bandwidth of the physical NIC it happens via. Once traffic is mapped to a particular physical NIC, if more bandwidth is required or the physical NIC is loaded, it is not as though an additional physical NIC is used. This is a catch with “Route based on IP hash” that’s worth remembering.

If you select “Route based on IP hash” as a load balancing option you get two warnings:

  • With IP hash load balancing policy, all physical switch ports connected to the active uplinks must be in link aggregation mode.
  • IP hash load balancing should be set for all port groups using the same set of uplinks.

What this means is that unlike the other load balancing schemes where there was no additional configuration required on the physical NICs or the switch(es) they connect to, with “Route based on IP hash” we must combine/ bond/ aggregate the physical NICs as one. There’s a reason for this.

In all the other load balancing options the virtual NIC MAC is associated with one physical NIC (and hence one physical port on the physical switch). So incoming traffic for a VM knows which physical port/ physical NIC to go via. But with “Route based on IP hash” there is no such one to one mapping. This causes havoc with the physical switch. Here’s what happens:

  • Different outgoing traffic flows choose different physical NICs. With each of these packets the physical switch will keep updating its MAC address table with the port the packet was got from. So for instance, say the two physical NICs are connected to physical switch Port1 and Port2 and the virtual NIC MAC address is VMAC1. When an outgoing traffic packet goes via the first physical NIC, the switch will update its tables to reflect that VMAC1 is connected to Port1. Subsequent traffic flows might continue using the first physical NIC so all is well. Then say a traffic flow uses the second physical NIC. Now the switch will map VMAC1 to Port2; then a traffic flow could use Port1 so the mapping gets changed to Port1, and then Port2, and so on …
  • When incoming traffic hits the physical switch for MAC address VMAC1, the switch will look up its tables and decide which port to send traffic on. If the current mapping is Port1 traffic will go out via that; if the current mapping is Port2 traffic will go out via that. The important thing to note is that the incoming traffic flow port chosen is not based on the IP hash mapping – it is purely based on whatever physical port the switch currently has mapped for VMAC1.
  • So what’s required is a way of telling the physical switch that the two physical NICs are to be considered as bonded/ aggregated such that traffic from either of those NICs/ ports is to be treated accordingly. And that’s what EtherChannel does. It tells the physical switch that the two ports/ physical NICs are bonded and that it must route incoming traffic to these ports based on an IP hash (which we must tell EtherChannel to use while configuring it).
  • EtherChannel also helps with the MAC address table in that now there can be multiple ports mapped to the same MAC address. Thus in the above example there would now be two mappings VMAC1-Port1 and VMAC1-Port2 instead of them over-writing each other!

“Route based on IP hash” is a complicated load balancing option to implement because of EtherChannel. And as I mentioned above, while it does allow a virtual machine to use more bandwidth than a single physical NIC, an individual traffic flow is still limited to the bandwidth of a single physical NIC. Moreover there is more overhead on the host because it has to calculate the physical NIC used for each traffic flow (essentially each packet).

Prior to vCenter 5.1 only static EtherChannel was supported (unless you use a third party virtual switch such as the Cisco Nexus 1000V). Static EtherChannel means you explicitly bond the physical NICs. But from vCenter 5.1 onwards the inbuilt distributed switch supports LACP (Link Aggregation Control Protocol) which is a way of automatically bonding physical NICs. Enable LACP on both the physical switch and distributed switch and the physical NICs will automatically be bonded.

(To enable LACP on the physical NICs go to the uplink portgroup that these physical NICs are connected to and enable LACP).

lacpThat’s it for now!

Update

Came across this blog post which covers pretty much everything I covered above but in much greater detail. A must read!

VCSA: Unable to connect to server. Please try again.

Most likely you set the VCSA to regenerate its certificates upon reboot and forgot to uncheck it after the reboot. (It’s under Admin > Certificate Regeneration Enabled). So each time you reboot VCSA gets a new certificate and your browser throws the above error.

Fix is to refresh (Ctrl+F5 in Firefox) the page so the new certificate is fetched and you get a prompt about it.

A very brief intro to Port Groups, Standard and Distributed switches

A year ago I went for VMware training but never got around to using it at work. Now I am finally into it, but I’ve forgotten most of the concepts. And that sucks!

So I am slowly re-learning things as I go along. I am in this weird state where I sort of remember bits and pieces from last year but at the same time I don’t really remember them.

What I have been reading about these past few days (or rather, trying to read these past few days) is networking. The end goal is distributed switches but for now I am starting with the basics. And since I like to blog these things as I go along, here we go.

You have a host. The server that runs ESXi (hypervisor).

This host has physical NICs. Hopefully oodles of them, all connected to your network.

This server runs virtual machines (a.k.a guests). These guests see virtual NICs that don’t really exist except in software, exposed by ESXi.

What you need is for all these virtual NICs to be able to talk to each other (if needed) as well as talk to the outside world (via the physical NICs and they switches they connect to).

You could create one big virtual switch and connect all the physical and virtual NICs to it. (This virtual switch is again something which does not physically exist). All the guests can thus talk to each other (as they are on the same switch) and also talk to the outside world (because the virtual switch is connected to the outside world via whatever it is connected to).

But maybe you don’t want all the virtual NICs to be able to talk to each other. You want little networks in there – a la VLANs – to isolate certain traffic from other. There’s two options here:

  1. Create separate virtual switches for each network, and assign some virtual NICs to some switches. The physical NICs that connect to these virtual switches will connect to separate physical switches so you are really limited in the number of virtual switches you have by the number of physical NICs you have. Got 2 physical NICs, you can create 2 virtual switches; got 5 physical NICs, you can create 5 virtual switches.
  2. Create one big virtual switch as before, but use port groups. Port groups are the VMware equivalent of VLANs (well, sort of; they do more than just VLANs). They are a way of grouping the virtual ports on the virtual switch such that only the NICs connected to a particular port group can talk to each other. You can create as many port groups as you want (within limits) and assign all your physical NICs to this virtual switch and use VLANs so the traffic flowing out of this virtual switch to the physical switch is on separate networks. Pretty nice stuff!

(In practice, even if you create separate virtual switches you’d still create a port group on that – essentially grouping all the ports on that switch into one. That’s because port groups are used to also apply policies to the ports in the group. Policies such as security, traffic shaping, and load balancing/ NIC teaming of the underlying physical NICs. Below is a screenshot of the options you have with portgroups).

Example of a Portgroup

Now onto standard and distributed switches. In a way both are similar – in that they are both virtual switches – but the difference is that a standard switch exists on & is managed by a host whereas a distributed switch exists on & is managed by vCenter. You create a distributed switch using vCenter and then you go to each host and add its physical NICs to the distributed switch. As with standard switches you create can portgroups in distributed switches and assign VM virtual NICs to these portgroups.

An interesting thing when it comes to migration (obvious but I wasn’t sure about this initially) is that if you have a host with two NICs – one of which is a member of a standard switch and the other of a distributed switch – but both NICs connect to the same physical network (or VLAN), and you have VMs in this host some of which are on the standard switch and others are on the distributed switch, all these VMs can talk to each other through the underlying physical network. Useful when you want to migrate stuff.

I got side tracked at this point with other topics so I’ll conclude this post here for now.

Adding DHCP scope options via PowerShell

Our deployment team needed a few DHCP options set for all our scopes. There was a brickload of these scopes, no way I was going to go to each one of them and right-click add the options! I figured this was one for PowerShell!

Yes, I ended up taking longer with PowerShell coz I didn’t know the DHCP cmdlets but hey (1) now I know! and (2) next time I got to do this I can get it done way faster. And once I get this blog post written I can refer back to it that time.

The team wanted four options set:

  • Predefined Option 43 – 010400000000FF
  • Custom Option 60 – String – PXEClient
  • Predefined Option 66 – 10.x.x.x
  • Predefined Option 67 – boot\x86\wdsnbp.com

PowerShell 4 (included in Windows 8.1 and Server 2012 R2) has a DHCP module providing a bunch of DHCP cmdlets.

First things first – I needed to filter out the scopes I had to target. Get-DhcpServerv4Scope is your friend for that. It returns an array of scope objects – filter these the usual way. For example:

Now, notice that one of the options to be added is a custom one. Meaning it doesn’t exist by default. Via GUI you would add it by right clicking on “IPv4″ and selecting “Set Predefined Options” then adding the option definition. But I am doing the whole thing via PowerShell so here’s what I did:

To add an option the Set-DhcpServerv4OptionValue is your friend. For example:

I had a bit of trouble with option 43 because it has a vendor defined format and I couldn’t input the value as given. From the help pages though I learnt that I have to give it in chunks of hex. Like thus:

Wrapping it all up, here’s what I did (once I added the new definition):

And that’s it!

Install VMware tools is grayed out in Workstation

Came across this problem today and couldn’t find any Google hits that helped me. Finally hit upon a solution.

VMware tools requires the guest to have a CD drive. In my case the physical host doesn’t have a CD drive, and I had no need to mount any CDs in the guest, so while creating the guest I removed the CD drive. No CD drive => no place for VMware to insert the CD. But rather than complain about it, it simply grays out the option.

So that’s it. Enable the CD drive and you will be able to install VMware tools! Hope this helps.

Get a list of recently installed Windows updates via the command line

In a previous post I gave a DISM command to get a list of installed Windows Updates:

While useful that command has no option of filtering results based on some criteria. 

If you are on Windows 8 or above the Get-WindowsPackage cmdlet can be of use:

This gets me all updates installed in the last 15 days. 

Another alternative (on pre-Windows 8 machines) is good ol’ WMIC:

The above gives output similar to this:

For more details more switches can be used:

Result is:

This output also gives an idea of the criteria available. 

So how can I filter this output like I did with PowerShell? Easy – use WQL (WMIC Query Language). Inspired by a blog post I found (which I am sure I have referred to in the past too) either of the following will do the trick:

-or- 

And if you want to format the output with specific fields:

Which results in something along these lines:

This includes Updates, Hotfixes, and Security Updates. If you want to filter down further, that too is possible (just mentioning these as a reference to my future self). Do a specific match:

Or a wildcard:

Or a negation:

These two links (WQL and WHERE clauses) were of use in picking up the WQL syntax. They are not very explanatory but you get an idea by trial and error. Once I had picked up the syntax I came across this about_WQL page that’s part of the PowerShell documentation and explains WQL operators. Linking to it here as a reference to myself and others. 

Unlike PowerShell I don’t know how to make WMIC use a greater than operator and simply specify the date. I tried something like this (updates installed after 12th May 2015):

But the results include some updates from 2013 & 2014 too. Not sure what’s wrong and I am not in the mood to troubleshoot at the moment. The like operator does the trick well for me currently. 

3Par Course: Day 1

Got a 3Par training going on at work since today. I have no experience with 3Pars so this is my first encounter with it really. I know a bit about storage thanks to having worked with HP LeftHands (now called HP StoreVirtual) and Windows Server 2012 Storage Spaces – but 3Par is a whole another beast!

Will post more notes on today’s material tomorrow (hopefully!) but here’s some bits and pieces before I go to sleep:

  • You have disks.
  • These disks are in magazines (in the 10000 series) – up to 4 per magazines if I remember correctly. 
    • The magazines are then put into cages. 10 magazines per cage? 
  • Disks can also be in cages directly (in the non-10000 series, such as the 7200 and 7400 series). 
    • Note: I could be getting the model numbers and figures wrong so take these with a pinch of salt!
  • Important thing to remember is you have disks and these disks are in cages (either directly or as part of magazines). 
  • Let’s focus on the 7200/ 7400 series for now.
    • A cage is a 2U enclosure. 
    • In the front you have the disks (directly, no magazines here). 
    • In the rear you have two nodes. 
      • Yes, two nodes! 3Pars always work in pairs. So you need two nodes. 
      • 7200 series can only have two nodes max/ min. 
      • 7400 series can have two nodes min, four nodes max. 
      • So it’s better to get a 7400 series even if you only want two nodes now as you can upgrade later on. With a 7200 series you are stuck with what you get. 
      • 7200 series is still in the market coz it’s a bit cheaper. That’s coz it also has lower specs (coz it’s never going to do as much as a 7400 series). 
    • What else? Oh yeah, drive shelves. (Not sure if I am getting the term correct here). 
    • Drive shelves are simply cages with only drives in them. No nodes!
    • There are limits on how many shelves a node can control.
      • 7200 series has the lowest limit. 
      • Followed by a two node 7400 series.
      • Followed by a four node 7400 series. This dude has the most!
        • A four node 7400 is 4Us of nodes (on the rear side).
        • The rest of the 42U (rack size) minus 4U (node+disks) = 38U is all drive shelves!
      • Number of drives in the shelf varies if I remember correctly. As in you can have larger size drives (physical size and storage) so there’s less per shelf. 
      • Or you could have more drives but smaller size/ lower storage. 
        • Sorry I don’t have clear numbers here! Most of this is from memory. Must get the slides and plug in more details later. 
    • Speaking of nodes, these are the brains behind the operation.
      • A node contains an ASIC (Application Specific Integrated Circuit). Basically a chip that’s designed for a specific task. Cisco routers have ASICs. Brocade switches have ASICs. Many things have ASICs in them. 
      • A node contains a regular CPU – for management tasks – and also an ASIC. The ASIC does all the 3Par stuff. Like deduplication, handing metadata, optimizing traffic and writes (it skips zeroes when writing/ sending data – is a big deal). 
      • The ASIC and one more thing (TPxx – 3Par xx are the two 3Par innovations). Plus the fact that everything’s built for storage, unlike a LeftHand which is just a Proliant Server. 
      • Caching is a big deal with 3Pars. 
        • You have write caching. Which means whenever the 3Par is given a blob of data, the node that receives it (1) stores it in its cache, (2) sends to its partner, and (3) tells whoever gave it the blob that the data is now written. Note that in reality the data is only in the cache; but since both nodes have it now in their cache it can be considered somewhat safe, and so rather than wait for the disks to write data and reply with a success, the node assures whoever gave it the data that the data is written to disk.
          • Write caching obviously improves performance tremendously! So it’s a big deal. 7400 series have larger caches than 7200.
          • This also means if one node in a two-pair is down, caching won’t happen – coz now the remaining node can’t simply lie that the data is written. What happens if it too fails? There is no other node with a cached copy. So before the node replies with a confirmation it must actually write the data to disk. Hence if one node fails performance is affected.
        • And then you have read caching. When data is read additional data around it is read-ahead.  This improves performance if this additional data is required next. (If required then it’s a cache hit, else a cache miss). 
        • Caching is also involved in de-duplication. 
          • Oh, de-duplication only happens for SSDs. 
            • And it doesn’t happen if you want Adaptive Optimization (whatever that is). 
      • There is remote copying – which is like SRM for VMware. You can have data being replicated from one 3Par system to another. And this can happen between the various models.
      • Speaking of which, all 3Par models have the same OS etc. So that’s why you can do copying and such easily. And manage via a common UI. 
        • There’s a GUI. And a command line interface (CLI). The CLI commands are also available in a regular command prompt. And there’s PowerShell cmdlets now in beta testing?
  • Data is stored in chunklets. These are 1GB units. Spread across disks
  • There’s something called a CPG (Common Provisioning Group). This is something like a template or a definition. 
    • You define a CPG that says (for instance) you want RAID1 (mirroring) with 4 sets (i.e. 4 copies of the data) making use of a particular set of physical disks.
  • Then you create Logical Disks (LDs) based on a CPG. You can have multiple logical disks – based on the same or different CPGs. 
  • Finally you create volumes on these Logical Disks. These are what you export to hosts via iSCSI & friends. 
  • We are not done yet! :) There’s also regions. These are 128 MB blocks of data. 8 x 128 MB = 1024 MB (1 GB) = size of a chunklet. So a chunklet has 8 regions.
  • A region is per disk. So when I said above that a chunklet is spead across disks, what I really meant is that a chunklet is made up of 8 regions and each region is on a particular disk. That way a chunklet is spread across multiple disks. (Tada!)
    • A chunklet is to a physical disk what a region is to a logical disk. 
    • So while a chunklet is 1 GB and doesn’t mean much else, a region has properties. If a logical disk is RAID1, the region has a twin with a copy of the same data. (Or does it? I am not sure really!) 
  • And lastly … we have steps! This is 128 KB (by default for a certain class of disks, it varies for different classes, exact number doesn’t matter much). 
    • Here’s what happens: when the 3Par gets some data to be written, it writes 128 KB (a step size) to one region, then 128 KB to another region, and so on. This way the data is spread across many regions.
    • And somehow that ties in to chunklets and how 3Par is so cool and everything is redundant and super performant etc. Now that I think back I am not really sure what the step does. It was clear to me after I asked the instructor many questions and he explained it well – but now I forget it already! Sigh. 

Some day soon I will update this post or write a new one that explains all these better. But for now this will have to suffice. Baby steps!

After Windows Update KB 3061518 many websites stop working in IE

At work every time some of our IT staff would access the BES server web console, IE would fail. Not with a 404 Page not found error but with a web site cannot be found error (with helpful hints to check your Internet connection, DNS, etc). 

The web console worked fine on my machine. I could ping the server from all machines and telnet to port 443 (HTTPS port) from all machines. The IE security, SSL, and certificate related settings were the same across all machines (including mine). Firefox gave the same error on all the machines – something about cipher suites – this was odd, but at least consistent across all machines (and we use IE with the console usually so wasn’t sure if Firefox always gave this error or it was just a recent occurrence). 

Since it didn’t seem to be a browser specific setting, and the Firefox cipher error was odd, I felt it must be something at the machine level. Unlike Firefox (and Chrome) which use their own SSL suite IE uses the Windows Secure Channel provider so there must be something different between my install of Windows and the problematic users. I had a look at the Event Viewer when opening the site in IE and found the following error: 

image001Couldn’t find much hits for that error The internal error state is 808 but at least it was Schannel related like I suspected. 

Time to find out if there were any difference in the Windows updates between my machine and the users. The following command gives a list of Windows Updates and the installed date:

The result was something along these lines (these are updates that were installed in the last two weeks on the problematic machine but not on my machine):

Since the problem started recently it must be one of the updates installed on the 20th. Going through each of the KB articles I finally hit gold with the last one – KB 3061518. Here’s what the update does:

This security update resolves a vulnerability in Windows. The vulnerability could allow information disclosure when Secure Channel (Schannel) allows the use of a weak Diffie-Hellman ephemeral (DHE) key length of 512 bits in an encrypted Transport Layer Security (TLS) session. Allowing 512-bit DHE keys makes DHE key exchanges weak and vulnerable to various attacks. For an attack to be successful, a server has to support 512-bit DHE key lengths. Windows TLS servers send a default DHE key length of 1,024 bits. After you install this security update, the minimum allowed DHE key length on client computers is changed to 1,024 bits by default, instead of the previous minimum allowed key length of 512 bits.

The workaround is simple. Either fix TLS on the webserver so its key length is 1024 bits or make a registry change on client computers so a key length of 512 bits is acceptable. I tested the latter on the user machine and that got the web console working, thus confirming the problem. Save the following as a .reg file and double click:

Reading more about the update I learnt that it’s a response to the logjam vulnerability against the Diffie-Hellman Key Exchange. I had forgotten about the Diffie-Hellman Key Exchange but thankfully I had written a blog post about this just a few months ago so I can go back and refresh my memory. 

Basically the Diffie-Hellman is an algorithm that helps two parties derive a shared secret key in public, such that anyone snooping on the conversation between these two parties has no idea what the shared secret key is. This secret key can then be used to encrypt further communication between these parties (using symmetric encryption, which is way faster). A good thing about Diffie-Hellman is that you can have an ephemeral version too in which every time the two parties talk to each other they generate a new shared secret to encrypt that conversation. This ephemeral Diffie-Hellman version is generally considered pretty secure and recommended (because if someone does ever break the encryption of one conversation, they still have no way of knowing what was talked about in other conversations). The ephemeral version can also use elliptic curves to generate the shared secret, this has the advantage of also being computationally faster. 

Anyways, ephemeral Diffie-Hellman is still a secure algorithm but there’s an unexpected problem with it. You see, back in the old days the US had export restrictions on crypto, and so web servers had this mode wherein they could be asked to use export ciphers and they will intentionally weaken security. A while back this “feature” was in the news thanks to the FREAK attack. Now it’s in the limelight again thanks to the logjam attack (which is what Microsoft’s KB fix above aims to fix). Unlike FREAK, though, which was an implementation bug, logjam is just expected behavior – just that no one still remembered this is how the system is supposed to behave now!

Here’s what happens. As usual the two parties (client and web server) will talk to each other in the open and come up with a shared secret key using the (ephemeral) Diffie-Hellman Key Exchange. In practice this should be foolproof but what could happen is that if the web server supports these export ciphers mode I mentioned above, someone could come in between the client-server conversation and ask the server to switch to this export mode (all that’s required is that the client – or rather someone pretending to be the client – ask the server for export grade ciphers). When the server gets this request it will intentionally choose weak parameters from its end as part of the Diffie-Hellman Key Exchange (specifically it will choose a 512 bits key that will be used to generate the ephmeral key later). The server won’t tell the client it’s choosing weaker parameters because it thinks the client asked, so the client is none the wiser that someone is playing mischief in between. The client will use these weaker parameters with the result that now the third party can try and decrypt the conversation between the client-server because the key they agree upon is a weaker key. 

Note that this has nothing to do with the key size of the certificate. And it has nothing to do with Diffie-Hellman Key Exchange being weak. It is only about servers still supporting this export mode and so the possibility that someone could get a server to use weaker parameters during the Key Exchange. 

The fix is simple really. You, the client, don’t have much of a say on what the server does. But you can insist that if a server could choose a 512 bits key as part of the Key Exchange process then you will simply refuse to deal with said server. And that’s what the KB fix above does. Once installed, if Schannel (on the client) notices that the web server it is talking to allows for 512 bits Diffie-Hellman keys it will simply refuse to talk to that web server. Period! (Note the emphasis here: even if the server only allows for 512 bits, i.e. it is not actually offering a weaker Diffie-Hellman parameter but merely supports export ciphers, the client will stop talking to it!). 

On the server side, the fix is again simple. You have two options – (1) disable export ciphers (see this and this articles for how to on Windows servers) and/ or (2) use Elliptic Curve Diffie-Hellman Key Exchange algorithms (because they don’t have the problems of a regular Diffie-Hellman Key Exchange). Do either of these and you are safe. (In our case I’ll have to get our admins looking after this BES console to implement one of these fixes on the server side). 

And that’s it! After a long time I had something fun to do. A simple problem of a website not opening turned out to be an interesting exercise in troubleshooting and offered some learning afterwards. :) Before I conclude here’s two links that are worth reading for more info on logjam:  CloudFlare postwebsite of the team who discovered logjam, with their recommendations for various web servers.

Cheers.

Update: This post by Matthew Green is a must read on logjam. Key takeaways (in case you weren’t already aware):

  • Logjam only applies to the non-Elliptic Curve variants of DHE. Regular DHE depends on the difficulty of solving the Discrete Logarithm problem. ECDHE depends on difficulty of solving Elliptic Curves.
  • Discrete Logarithm is still difficult to solve, but for small values of its parameters (e.g. 512 bits) it is relatively easier. 
  • Things are also made easier by the fact that most web servers use a common set of prime numbers. So attackers can precompute a table of prime numbers and use these to degrade the connection so it uses one of these weaker set of prime numbers (whose attack tables are already with them) and use these to quickly decrypt. 
  • Read the rest of that post for lots more interesting details. Thanks to Bruce Schneier’s blog for pointing to the post. 

Also, remember: this is about the key exchange. Not about server identity. The server can use an RSA certificate to verify its identity but use Diffie-Hellman for key exchange (and that is the preferred scenario in fact as Diffie-Hellman is better). Here’s RFC 4492 which lists five different key exchange algorithms. Moral of the story is, use Diffie-Hellman as usual but either disable the export grade stuff (so there’s no chance of using weaker parameters) or switch to the better Elliptic Curve Diffie-Hellman. This has nothing to do with the RSA or DSA certificates that the server might otherwise use

Microsoft .NET Framework 3.0 failed to be turned on. Status: 0x800f0906

While installing SQL Server 2012 at work on a Server 2012 R2 machine the install kept failing. Looking at the Setup logs I noticed that whenever I’d try an install something would try and install the .NET Framework 3.0 feature and fail. So I tried enabling the feature manually but it kept failing with the above error: Update NetFx3 of package Microsoft .NET Framework 3.0 failed to be turned on. Status: 0x800f0906.

On the GUI I kept getting errors that the source media didn’t have the required files, which was odd as I was correctly pointing to the right WIM file (and later I even expanded this file to get to the Sources\SxS folder) but event viewer had the above error. 

To get a better idea I tried install this feature via the command line. The .NET Framework 3.0 feature is called NET-Framework-Core (you can find this via a Get-WindowsFeature) so all I had to do was something along these lines:

I didn’t expect this to work but strangely it did! No errors at all. Thought I’d post this in case it helps anyone else. 

 

RSA SecureID soft token: error reading sdtid file

So at work we are rolling out the newer BB OS 10.x devices. We use RSA keyfobs  and they have a software variant where-in you can load a software version of the keyfob in an app supplied by RSA. There are apps for iOS, Windows, Android, and BlackBerry OS so it’s a pretty good option. 

The way this works is that you create a file (ending with a .sdtid extension) which is the software keyfob (let’s call this a “soft token” from now on). You then import this into the app and it generates the changing codes. iOS, Windows, and Android have it easy in that there are RSA tools to convert this soft token to a QR code which you can simply scan and import this into the app. These OSes also don’t have the concept of separate spaces, so you the IT admin can easily email the soft token to your users and they can open & import it into the app. But BlackBerry users have a work  space and a personal space on their device, and corporate email is in the work space, so you can only import the token into the RSA app if it’s installed from the app store in the work space. 

Again, in practice that shouldn’t be an issue, but in our firm the RSA app isn’t appearing on the app store in the work space. The BES admins have published the app to the app store, yet it doesn’t appear. They are taking their sweet time troubleshooting, so I figured why not just install the app in the personal space and somehow get the soft token into that?

One option would be to create an email account in the personal space with the user’s private account and email the token to that. Too much effort! Another option would be to put it up on a website and access it via the personal space browser, then import. Yet another option would be to just plug in the device to the computer, copy the soft token to the micro SD card, and then import. The latter is what I decided to go with. 

Everything went well but when it came to importing some devices gave an error along the following lines: “error reading sdtid file”. Uninstalling re-installing the RSA app did the trick. I am not sure how that helped but my guess is when the app launches it asks for permissions to read your micro SD card etc, and am guessing when the user was presented with that he/ she ignored the prompt or denied the request. As a result the app couldn’t read the soft token from the micro SD card and threw the above error. That’s my guess at least.  In any case, uninstall re-install the app and that should do the trick! ;-) I found many forum posts with this question but none with a straight-forward answer so thought I should make a blog post in case it helps someone. 

Steps to root OnePlus One (Bacon)

Not a comprehensive post, just a note to myself on what I need to do every time the device is updated and loses root + recovery (though the latter can be avoided by disabling the option to update recovery during system upgrades in Developer Options).

  1. Get the Bacon Root Toolkit (BRT), cousin of the wonderful Nexus Root Toolkit.
  2. Enable ADB on the device (it’s under Developer Options).
  3. Connect device, confirm BRT has detected it as an ADB device.
    1. This doesn’t always happen. In such cases (a) try a different port, (b) try a different cable, and (c) check that the ADB device appears in Device Manager. If it does not, reinstall the Google drivers using BRT.
  4. Flash Custom Recovery (my choice is TWRP) from BRT. This is needed to root the device. Default Cyanogen Recovery can’t do this. This requires a couple of reboots. 
  5. Reboot into the Recovery and exit. I use TWRP, and when existing it checks whether the device is rooted and offers to root it. Go ahead and do that.
  6. SuperSU (and SuperSU Pro) are what one uses to manage root. (Apparently CM 12 allows one to do this using the in-built Privacy Guard but I couldn’t find any options for that. Another option is Superuser, but that doesn’t work on Android 5.0 yet I think). 
    1. CM 12 also apparently has an option to enable/ disable root under Developer Options but I couldn’t find that on my device (before or after rooting).

That’s it! One of the reasons I went with OnePlus One and Cyanogen is the hope that the device will stay rooted after updates, but that isn’t the case. I guess this is so the OS and stay compliant with Google. So each time I do a major update I need to repeat these steps. This doesn’t happen often so by the time I get around to doing this I have usually forgotten what I did last time around. Hopefully I can come back and refer to this post the next time!