Contact

Subscribe via Email

Subscribe via RSS/JSON

Categories

Creative Commons Attribution 4.0 International License
© Rakhesh Sasidharan

Elsewhere

Word 2010 – The xxxx.docx cannot be opened because there are problems with the contents

Got the following error for a Word document at work.

Obviously your mileage may vary in terms of the fix but here’s what I did so there’s a starting point.

Since this is a docx file I extracted it using 7-Zip. Went through the XML files in it but they seemedĀ  fine. Next I extracted another working docx file and replaced the “[Content_Types].xml” file of the broken one with that of the working one. Zipped it all back into a docx file, double clicked, and I got a different error now but the document opened fine. It complained about comments or something missing, but all that was expected as obviously I had replaced a master file with another one. The fact that it opened fine (more or less) confirmed that this file must be the culprit.

Next I tried removing bits and pieces from the broken “[Content_Types].xml” file but that didn’t help. Finally I compared the two side by side, starting with the stuff I hadn’t removed. I noticed that the broken file had an entry like this:

The same one in the working file was different:

So I replaced the line in the broken file with the one in the working file, zipped it all back, double clicked and voila! it opens fine now. :)

From this MIME types document it seems like “application/vnd.ms-word.document.macroEnabled.12” is a “.docm” file so at this point my guess is that the user copy pasted something from another document and that possible corrupted his destination document? I don’t know.

How to remove complex scripts from Word DOCX documents

Recently came across a Word document where some parts of the document seemed to ignore the general rules. The document was in English, and its language was set to English (U.S.) but certain parts were set to Arabic (Saudi) and none of the usual methods of selecting the text and marking it as English (U.S.) was helping. Very weird.

After a lot of fiddling around I also noticed that if I change the style of a paragraph containing such text, the adjoining text changes but this particular one stays as it. I am able to change the font and size directly by applying them, but changes via styles seem to get ignored.

Then I realized that although this text was in English, since it was marked as Arabic (Saudi) they were being treated as “complex scripts” in the style definitions and hence had separate rules. I guess that at some point someone had marked this text as of being Arabic (Saudi) and continued typing in English, or perhaps the original text was Arabic but someone had changed the font to an English one like “Times New Roman” and typed in English, so even though the text was appearing as English in fact Word was treating it as Arabic written in English (I guess). Anyways, point was Word was treating these blocks as complex scripts (as opposed to Latin for other parts) and so the usual formatting rules didn’t apply to them. Moreover I could change the language from Arabic (Saudi) to Arabic (UAE), for instance, so that seemed to support my theory that it was letting me changing the language to other complex scripts – just not from complex to Latin and vice versa.

This being a DOCX file, it is really just a zip file. So I unzipped it using 7-Zip. Went to the word\styles.xml file (which I came across through trial and error actually, I went through pretty much all the XML files there) in the extracted folder and found theĀ  following:

Since I didn’t want the document to have any Arabic at all, I simply changed the “ar-SA” to “en-US”. Saved the XML file, went back to the extracted folder, and zipped all its contents up again. Renamed this from .zip to .docx and opened the document, and bingo! now all that complex stuff weirdness was gone! :)

(A word to note about zipping back the folder. The format is ZIP. And also, don’t zip the top level folder as then your zip file will be the top level folder followed by all the sub-folders. No, what we want is that the zip file is all the sub-folders directly).