Search code examples
excelencryptionopenxml-sdkcorrupt

.xlsx file created in Excel for Widows then edited in Excel for Mac is no longer an archive file and cannot be opened by OpenXML SDK


I created a .xlsx file in Excel for Widows desktop app. The file uses named ranges, conditional formatting and some basic formulas (SUM, IF, NOT, ISBLANK, *, +). Most of the worksheet and all the workbook are locked with users only able to select and add data to certain cells.

The file was sent to someone who uses a MacBook Air (macOS Mojave v10.14.6) with Excel for Macintosh v12.3.6. He added his data, saved the changes (the file extension is still .xlsx) and sent the file back to me.

MacBook Air device details

The file is destined to be opened by the OpenXML SDK using SpreadsheetDocument.Open in a .Net Core 7.0 Azure Function. The first step that the SDK takes is to unzip the file, but the code was returning a corrupt file message.

using SpreadsheetDocument document = SpreadsheetDocument.Open(stream, false);

OpenXML SDK error:

System.IO.Packaging: File contains corrupted data.

  • at System.IO.Packaging.ZipPackage..ctor(Stream s, FileMode packageFileMode, FileAccess packageFileAccess)
  • at System.IO.Packaging.Package.Open(Stream stream, FileMode packageMode, FileAccess packageAccess)
  • at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.OpenCore(Stream stream, Boolean readWriteMode)
  • at DocumentFormat.OpenXml.Packaging.SpreadsheetDocument.Open(Stream stream, Boolean isEditable, OpenSettings openSettings)
  • at ZipFiles.Program.Main(String[] args) in D:\Users\mark.cherry\Source\Repos\ZipFiles\ZipFiles\Program.cs:line 22

It was this that lead me to test opening the file using .Net Core System.IO.Compression.ZipArchive in a bare console application. This returns a similar error but with obvious differences.

using FileStream zipToOpen = new (@"Invoice - Fixed Fee.xlsx.zip", FileMode.Open);
using ZipArchive zip = new (zipToOpen, ZipArchiveMode.Read);

ZipArchive error:

System.IO.Compression: End of Central Directory record could not be found.

  • at System.IO.Compression.ZipArchive.ReadEndOfCentralDirectory()
  • at System.IO.Compression.ZipArchive..ctor(Stream stream, ZipArchiveMode mode, Boolean leaveOpen, Encoding entryNameEncoding)
  • at System.IO.Compression.ZipArchive..ctor(Stream stream, ZipArchiveMode mode)
  • at ZipFiles.Program.Main(String[] args) in D:\Users\mark.cherry\Source\Repos\ZipFiles\ZipFiles\Program.cs:line 12

As a further test of the file I changed the extension from .xlsx to .xlsx.zip and I get the same error 'The Compressed (zipped) Folder ... is invalid.':

Windows Explorer Error Message when trying to open the .xlsx file as .zip

The strange thing is, the original .xlsx file will open in Excel for Windows desktop app (v2308 (Build 16731.20234 Click-to-Run)):

Excel for Windows Version

If I open then save the corrupt file in Excel for Windows desktop app, then change the file extension to .xlsx.zip, the file will open as an archive. I did not change anything in the file, I just opened and saved it.

My understanding is that all .xlsx files are Microsoft Excel Open XML based on the OpenXML format and are archives containing several sub-directories and files.

I asked the same person to create a new blank workbook in Excel for Macintosh and send it to me. When I renamed the file extension of that file to .xlsx.zip, the file opens as an archive.

This is the only user who has returned the file with this problem. All other users are not experiencing this issue, but they may not be using MacBook Airs. I have tried to reproduce the error using iPads, but I could not create a file with the same error. I do not have a Mac so I am unable to diagnose the issue since opening and saving the file in Windows removes the fault.

Why is the locked file returning as something other than an archive when edited in Excel for Macintosh, and why does a blank workbook from the same Excel for Macintosh not show the same problem?

The password for the locked file does contain at least one !. Could this be a cryptography error while saving the file?

Any help to stop this file from becoming corrupt on this users Mac or to allow me to open the file in OpenXML on Windows would be much appreciated.

Test files can be accessed here: Excel for Mac Corruption

  • Book1.xlsx = File created on Mac that will open as an archive when renamed to .zip
  • Invoice - Fixed Fee.xlsx = The file that will not open as an archive when renamed to .zip
  • Invoice - Fixed Fee v1.1 - Unlocked.xlsx = Original file sent to user but the spreadsheet is unlocked.
  • Invoice - Fixed Fee v1.1 - Unlocked All.xlsx = Original file sent to user but is completely unlocked.
  • Invoice - Fixed Fee v1.1.xlsx = Locked file sent to user.

On the Excel community forum it was suggested that I look at the differences in the file's binary data between the Mac version and a version that has been opened and saved in Excel for Windows. The idea was to see if I could spot a difference that could cause the error. Unfortunately, there were too many differences for this method to be useful.

As detailed in the body of the question above, I have tried:

  • Renamed file to .zip and tried to open via Windows Explorer
  • Create a basic .Net Core v7.0 console app and tried to open the file using ZipArchive
  • Opened the file in Excel desktop app on Windows, saved the file, then renamed to .zip and opened. This worked but is not a suitable solution.
  • Compare the binary file data to see if there was a basic difference between the corrupt file from the Mac and the open and saved version from Windows.

Regards,

Mark

Addendum 1:

The protection in the Excel file is done at two levels:

  • Review > Protect Sheet
  • Review > Protect Workbook

By default, all the cells in spreadsheet are set to be locked if the spreadsheet is protected. To only allow the users to enter data in some cells I:

  1. Right click on the cell and select Format Cells… from the context menu.
  2. In the Format Cells modal, select the Protection tab.
  3. Deselect the Locked option.
  4. Select the OK button.

Format Cell Protection Option Dialog in Excel

Review > Protect Sheet

  1. From the Review tab of the top menu, select Protect Sheet from the ribbon.
  2. Enter a password in the Password (optional): text box.
  3. Select the Protect worksheet and contents of locked cells option.
  4. Only chose Select unlocked cells from the Allow all users of this worksheet to: list.
  5. Select the OK button.

Protect Sheet Dialog in Excel

Review > Protect Workbook

  1. From the Review tab of the top menu, select Protect Workbook from the ribbon.
  2. Enter a password in the Password (optional): text box.
  3. Select Structure from the Protect workbook for list.
  4. Select the OK button.

Protect Workbook Dialog in Excel

The instructions to the Mac user were to open the file then save it without entering any data.

I don’t think I am encrypting anything; I am only protecting the structure of the workbook and spreadsheet. This appears to be true when I look at the files in the .xlsx bundle (.zip archive). All of the .xml files for the protected document are in plain text and the named ranges and default data can be read with notepad.

Addendum 2:

I've had a chance to scan through the file specification documents and I understand what @tomjebo is driving at now. The Excel file that I protected is not encrypted. Once the Mac user opens it, adds data or doesn't add data, then saves the file again, something in his MS Office environment is adding encryption to the file, probably with a blank password. I assume this could be a default setting in Excel for Mac (I have no way of testing this), or it could be one of the items listed in the Introduction section of the MS-OFFCRYPTO specification file linked in the answer:

"The Office Document Cryptography Structure is relevant to documents that have Information Rights Management (IRM) policies, document encryption, or signing and write protection applied." (Page 8/119: [MS-OFFCRYPTO] - v20230815 Office Document Cryptography Structure Copyright © 2023 Microsoft Corporation Release: August 15, 2023)

As a test, I navigated to OneDrive for Business in a browser then opted to open the file the customer returned to me in Excel online ('Open in browser' option from the context menu). Excel Online asked me to provide a password. The customer did not provide me with a password, and Excel Online will not accept a blank password, so I couldn't open the file. Since the file opens in Excel for Windows on my desktop without asking for a password, I assume that the password is blank. The file is definitely encrypted.

As a side note, I tried to open the file in an Azure Logic App then transfer it back to the Azure Storage Account as a new blob to see if I could use Excel Online to remove the encryption. This does not work because the ‘Get worksheets' action for Excel for Business in the Logic App throws the following HTTP 403 error:

"The request is forbidden by Graph API. Error code is 'PasswordProtectedWorkbook'. Error message is 'Forbidden'."

Since I do not have the time or the knowledge to be able to write code to discover the file's encryption type then unencrypt the file, I am left with two alternatives.

  1. I can abandon the use of OpenXML SDK and use a paid-for third party library to work with the files.
  2. I can catch the exception that is thrown and pass the file to co-workers for manual processing.

Solution

  • When Excel (and Office in general) encrypts or protects a document, it converts it to a Compound Binary File format. This is no longer a Zip archive. You can see the signature in the first several bytes of the file as 0xD0CF11E0A1B11AE1 (side note, 0xD0CF11E is supposed to read like "doc file" and is the signature for CFB). The Open XML SDK does not open/read encrypted or protected Office documents as do not Zip archive utilities. This is why your document won't open.

    You can read about the encryption/protection process in MS-OFFCRYTPO and you can read about the CFB format (the file format used before Office Open XML was adopted) in MS-CFB

    Hope this helps.