Search code examples
rubyencodingoptparse

Encoding problems with ruby while reading in command line arguments with optparse


I'm writing a small programm in ruby, which essentially changes some files within a zip-file. The zip-file is specified as a parameter on the command line and interpreted via the OptionParser.

The problem is, that when specifiying a file, which contains non-ascii characters, the file cannot be opened, saying that it could not be found. This problem occurs using cmd.exe under Windows.

Here is a minimal example:

# example.rb
require "zip"
require "optparse"

zip_file_name = String.new

# read and interprete command line arguments:
OptionParser.new do |opts|
    opts.on("-f", "--file FILE", String, "The zip-file, which will be modified") do |f|
        zip_file_name = f
    end
end.parse!

# Open the zip file:
Zip::File.open(zip_file_name) do |zipfile|
end

If you create a zip-file test.zip and run example.rb -f test.zip everything is okay (it does finish without errors). Doing the same with a zip-file täst.zip gives me an error. I tried doing zip_file_name.encode!(Encoding::UTF_8), but this didn't solve the problem.

It seems to be an encoding problem (the encoding of zip_file_name is cp850) but the transcoding does not seem to work correctly.

So my question would be: How can I change my program to also allow non-ascii characters for specifying files on the command line?


Solution

  • Adding zip_file_name.force_encoding(Encoding::Windows_1252) before opening the file solves the issue (on Western Europe Windows).

    Apparently, the CP850 file names encoding is a wrong assumption from Ruby. On my Windows system, it seems that filenames are encoded in Windows_1252 (a custom version of Latin1 or ISO 8859-1).