Search code examples
ruby-on-railspdf-readerrubyzip

Is it possible to read pdf file inside rubyzip by pdf-reader?


Is it possible to read a PDF file inside a zip file by pdf-reader? I tried this code but it does not work.

require 'zip'

Zip::File.open('/path/to/zipfile') do |zip_file|
zip_file.each do |entry|
if entry.directory?
  puts "#{entry.name} is a folder!"
elsif entry.symlink?
  puts "#{entry.name} is a symlink!"
elsif entry.file?
  puts "#{entry.name} is a regular file!"

  reader = PDF::Reader.new("#{entry.name}")
  page = reader.pages.each do |page|
  puts page.text
  end
else
  puts "#{entry.name} is something unknown"
end
end
end

Thanks


Solution

  • PDF::Reader validates that the input is a "IO-like object or a filename" based on 2 criteria.

    • Determines if it is "IO-like" based on the object responding to seek and read
    • Determines if it is a File based on File.file?

    Excerpt Source:

    def extract_io_from(input)
       if input.respond_to?(:seek) && input.respond_to?(:read)
         input
       elsif File.file?(input.to_s)
         StringIO.new read_as_binary(input)
       else
         raise ArgumentError, "input must be an IO-like object or a filename"
       end
     end
    

    Unfortunately while Zip::InputStream emulates an IO object fairly well it does not define seek and therefor it does not pass the validation above. What you can do is create a new StringIO from the contents of the Zip::InputStream via

    StringIO.new(entry.get_input_stream.read)
    

    This will guarantee that PDF::Reader sees this as an "IO-like object" and processes it appropriately.