Search code examples
rubyexcelzipxlsxrubyzip

xlsx compressed by rubyzip not readable by Excel


I am working on writing code which can read/write Excel xlsx files. xlsx files are simply zip archives of several xml files, so in order to test out if I could write a file, I used a gem called rubyzip to unzip the xlsx file and then immediately zip it back up to a new archive, without modifying the data. When I do this, however, I cannot open the new excel file, it is said to be corrupted.

Alternatively, if I use Mac OS X's Archive Utility (the native application to handle zip files), and I unzip and re-zip an excel file, the data is not corrupted and I can open the resultant file in Excel.

I have found that it is not the 'unzip' functionality of rubyzip that "corrupts" the data, but the zip process. (In fact, when I use Archive Utility on the new zip file that rubyzip creates, the file is again readable by Excel).

I'm wondering why this happens, and what solutions there could be to zip the contents programmatically in a way which is readable by Excel.

My code for zipping:

def compress(path)
    path.sub!(%r[/$],'')
    archive = File.join(path,File.basename(path))+'.zip'
    FileUtils.rm archive, :force=>true
    Zip::ZipFile.open(archive, 'w') do |zipfile|
        Dir["#{path}/**/**"].reject{|f|f==archive}.each do |file|
            temp = file
            zipfile.add(file.sub(path+'/',''),file)
        end
    end
end

Solution

  • There are a number of constraints that the OOXML format imposes on the use of Zip in order for the packages to be conformant. For example, the only compression method permitted in the package is DEFLATE.

    You might want to check the specification for OPC packages (which .XSLX files are) in Annex C of the standard available here (Zip), and then ensure that the rubyzip library is not doing anything that is not permitted (such as using the IMPLODE compression method).