Search code examples
rubyzipdocxrubyzip

RubyZip docx issues with write_buffer instead of open


I'm adapting the RubyZip recursive zipping example (found here) to work with write_buffer instead of open and am coming across a host of issues. I'm doing this because the zip archive I'm producing has word documents in it and I'm getting errors on opening those word documents. Therefore, I'm trying the work-around that RubyZip suggests, which is using write_buffer instead of open (example found here).

The problem is, I'm getting errors because I'm using an absolute path, but I'm not sure how to get around that. I'm getting the error "#//', name must not start with />"

Second, I'm not sure what to do to mitigate the issue with word documents. When I used my original code, which worked and created an actual zip file, any word document in that zip file had the following error upon opening: "Word found unreadable content in Do you want to recover the contents of this document? If you trust the source of this document, click Yes." The unreadable content error is the reason why I went down the road of attempting to use write_buffer.

Any help would be appreciated.

Here is the code that I'm currently using:

require 'zip'
require 'zip/zipfilesystem'

module AdvisoryBoard
  class ZipService
    def initialize(input_dir, output_file)
      @input_dir = input_dir
      @output_file = output_file
    end

    # Zip the input directory.
    def write
      entries = Dir.entries(@input_dir) - %w[. ..]
      path = ""

      buffer = Zip::ZipOutputStream.write_buffer do |zipfile|
        entries.each do |e|
          zipfile_path = path == '' ? e : File.join(path, e)
          disk_file_path = File.join(@input_dir, zipfile_path)

          @file = nil
          @data = nil

          if !File.directory?(disk_file_path)
            @file = File.open(disk_file_path, "r+b")
            @data = @file.read

            unless [@output_file, @input_dir].include?(e)
              zipfile.put_next_entry(e)
              zipfile.write @data
            end

            @file.close
          end
        end

        zipfile.put_next_entry(@output_file)

        zipfile.put_next_entry(@input_dir)
      end

      File.open(@output_file, "wb") { |f| f.write(buffer.string) }
    end
  end
end

Solution

  • I was able to get word documents to open without any warnings or corruption! Here's what I ended up doing:

    require 'nokogiri'
    require 'zip'
    require 'zip/zipfilesystem'
    
      class ZipService
        # Initialize with the directory to zip and the location of the output archive.
        def initialize(input_dir, output_file)
          @input_dir = input_dir
          @output_file = output_file
        end
    
        # Zip the input directory.
        def write
          entries = Dir.entries(@input_dir) - %w[. ..]
    
          ::Zip::File.open(@output_file, ::Zip::File::CREATE) do |zipfile|
            write_entries entries, '', zipfile
          end
        end
    
        private
    
        # A helper method to make the recursion work.
        def write_entries(entries, path, zipfile)
          entries.each do |e|
            zipfile_path = path == '' ? e : File.join(path, e)
            disk_file_path = File.join(@input_dir, zipfile_path)
    
            if File.directory? disk_file_path
              recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
            else
              put_into_archive(disk_file_path, zipfile, zipfile_path, e)
            end
          end
        end
    
        def recursively_deflate_directory(disk_file_path, zipfile, zipfile_path)
          zipfile.mkdir zipfile_path
          subdir = Dir.entries(disk_file_path) - %w[. ..]
          write_entries subdir, zipfile_path, zipfile
        end
    
        def put_into_archive(disk_file_path, zipfile, zipfile_path, entry)
          if File.extname(zipfile_path) == ".docx"
            Zip::File.open(disk_file_path) do |zip|
              doc = zip.read("word/document.xml")
              xml = Nokogiri::XML.parse(doc)
              zip.get_output_stream("word/document.xml") {|f| f.write(xml.to_s)}
            end
            zipfile.add(zipfile_path, disk_file_path)
          else
            zipfile.add(zipfile_path, disk_file_path)
          end
        end
      end