Subscribe via Email

Subscribe via RSS/JSON


Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan


Word 2010 – The xxxx.docx cannot be opened because there are problems with the contents

Got the following error for a Word document at work.

Obviously your mileage may vary in terms of the fix but here’s what I did so there’s a starting point.

Since this is a docx file I extracted it using 7-Zip. Went through the XML files in it but they seemedĀ  fine. Next I extracted another working docx file and replaced the “[Content_Types].xml” file of the broken one with that of the working one. Zipped it all back into a docx file, double clicked, and I got a different error now but the document opened fine. It complained about comments or something missing, but all that was expected as obviously I had replaced a master file with another one. The fact that it opened fine (more or less) confirmed that this file must be the culprit.

Next I tried removing bits and pieces from the broken “[Content_Types].xml” file but that didn’t help. Finally I compared the two side by side, starting with the stuff I hadn’t removed. I noticed that the broken file had an entry like this:

The same one in the working file was different:

So I replaced the line in the broken file with the one in the working file, zipped it all back, double clicked and voila! it opens fine now. :)

From this MIME types document it seems like “application/” is a “.docm” file so at this point my guess is that the user copy pasted something from another document and that possible corrupted his destination document? I don’t know.

How to remove complex scripts from Word DOCX documents

Recently came across a Word document where some parts of the document seemed to ignore the general rules. The document was in English, and its language was set to English (U.S.) but certain parts were set to Arabic (Saudi) and none of the usual methods of selecting the text and marking it as English (U.S.) was helping. Very weird.

After a lot of fiddling around I also noticed that if I change the style of a paragraph containing such text, the adjoining text changes but this particular one stays as it. I am able to change the font and size directly by applying them, but changes via styles seem to get ignored.

Then I realized that although this text was in English, since it was marked as Arabic (Saudi) they were being treated as “complex scripts” in the style definitions and hence had separate rules. I guess that at some point someone had marked this text as of being Arabic (Saudi) and continued typing in English, or perhaps the original text was Arabic but someone had changed the font to an English one like “Times New Roman” and typed in English, so even though the text was appearing as English in fact Word was treating it as Arabic written in English (I guess). Anyways, point was Word was treating these blocks as complex scripts (as opposed to Latin for other parts) and so the usual formatting rules didn’t apply to them. Moreover I could change the language from Arabic (Saudi) to Arabic (UAE), for instance, so that seemed to support my theory that it was letting me changing the language to other complex scripts – just not from complex to Latin and vice versa.

This being a DOCX file, it is really just a zip file. So I unzipped it using 7-Zip. Went to the word\styles.xml file (which I came across through trial and error actually, I went through pretty much all the XML files there) in the extracted folder and found theĀ  following:

Since I didn’t want the document to have any Arabic at all, I simply changed the “ar-SA” to “en-US”. Saved the XML file, went back to the extracted folder, and zipped all its contents up again. Renamed this from .zip to .docx and opened the document, and bingo! now all that complex stuff weirdness was gone! :)

(A word to note about zipping back the folder. The format is ZIP. And also, don’t zip the top level folder as then your zip file will be the top level folder followed by all the sub-folders. No, what we want is that the zip file is all the sub-folders directly).

Write-Host and Write-Output

Good to remember:

  1. Write-Host converts the object to a string representation and then outputs it to the host (console). There’s two things to keep in mind here: (1) the output bypasses any pipelines etc and is sent directly to the host; (2) the object is converted to a string representation, using the toString() method (which is present in all classes), and so the string representation may not necessarily be what you expect.

    Case in hand:

  2. Write-Output is better in the sense that (1) it only passes the output to the next step of the pipeline (or to the host if we are at the end of the pipeline) and (2) no conversion happens, the object is output and formatted as it is.

    To give an example with the previously created table:

    This is same as if you were to output the object directly:

    As previously mentioned the formatting of such output depends the ps1xml defintions for that object.

Naming PowerShell custom objects and setting their default display properties

Learnt a couple of things today. Not in depth, but now I am aware of these features and will explore them in depth someday.

I knew how to create custom objects in PowerShell and I have always tried to return output from my functions/ scripts as custom objects. I was also aware that you can set the default display properties so the output is neater.

Say I create a new object like this:

Notice when I output the object all its properties are output. Usually I may not want that. I may want that only the name property is output and the rest are silent, only shown if asked for.

It’s possible to define the default properties you are interested in. This link gives you more details, the tl;dr summary of which is as follows:

  1. All objects contain a member object called PSStandardMembers which defines the default properties of the object.
  2. The PSStandardMembers object a member object called DefaultDisplayPropertySet. This object contains a property called ReferencedPropertyNames which lists the default displayed properties of the object.
  3. Apart from DefaultDisplayPropertySet you have DefaultKeyPropertySet and DefaultDisplayProperty objects too. I am not sure what DefaultDisplayProperty does but DefaultKeyPropertySet is used when sorting and grouping objects.

To set the PSStandardMembers property of an object one does the following:

Notice now only the properties we specified are shown.

As an aside, and purely because I spent some time trying to figure this out, here’s how DefaultKeyPropertySet influences sorting:

(Thanks to this post which made me realize what DefaultKeyPropertySet does).

Back to DefaultDisplayPropertySet – the problem is that it doesn’t work in PowerShell v2. It’s a bug with PowerShell v2 and this Stack Overflow post gives a workaround which involves creating a temporary ps1xml file for the custom objects and defining its default properties.

I haven’t explored ps1xml files much but the gist of the matter is (1) they are what PowerShell uses to format object output and (2) you can create custom ps1xml files for your custom objects. The Stack Overflow post gives a function that takes an object and an array of properties and sets these properties as the default for that object. It’s a neat function and works as expected, but for a catch …

The catch is that since all custom objects have the same name you can’t set different default properties for different objects. Unless you give a name for the custom object, of course, which differentiates each type of custom object from the other. So how do you go about naming custom objects?

First up, how do you get the current name of an object? From my reflection post we know the following works:

To fiddle with the type name you have to use some hidden members of every object. (This was another new thing learnt today. Didn’t know objects had hidden members too). The way to see these is via Get-Member -Force cmdlet. Have a look at the help for the -Force parameter:

From the help file its clear PSTypeNames is what we are interested in.

The members Clear and Add seem to be what we want:

Instead of clearing the existing types, one can Insert the new type to the top of the list too:

Gotta love it when things fall into place and you have a language that makes it easy to do all these things!

My thanks to this and this post for pointing me towards PSTypeNames.