I know that I can invoke a ruby
script as follows to set the default string encoding which will be in effect when running the script ...
ruby -Eencoding script.rb args ...
However, is there some call that I could make inside of script.rb
itself to set the encoding in the same way, if I run the script directly as follows, instead of via a call to the ruby
interpreter? ...
./script.rb args ...
In other words, I want all string operations that are performed when script.rb
is running to make use of an encoding that I specify at run time inside of script.rb
. This includes all string operations that are performed inside of the ruby methods and functions and internals that are called when script.rb
is executing.
For example, assume that I implement an optional -e
command-line argument to script.rb
which will contain the name of an encoding, such as ISO-8859-1
or UTF-8
or any other valid encoding. I could then run the script as follows:
./script.rb -eISO-8859-1 args ...
And another time, I might run that same script like this:
./script.rb -eUTF-8 args ...
Inside of the script, I could use OptionParser
or GetOptLong
or some other code to parse the command-line arguments and extract that optional encoding argument. If that argument has been passed in, I want to then call some sort of ruby
function to set that to be the encoding used in all subsequent string operations, including operations in all ruby
internal methods, functions, and modules.
I could not find any discussion of any such function in ruby
. Does such a function even exist in ruby
?
Also, I don't want to have to set an environment variable outside of script.rb
in order to tell it which string encoding to utilize. I want this choice of string encoding to be made via some sort of executable ruby
code that I invoke at run time, from inside of script.rb
.
Thank you in advance for any ideas and suggestions.
UPDATE: I tried setting ruby's default_internal
coding per the suggestion by @Stefan, but it doesn't work. Here's some sample code which illustrates the problem ...
#!/opt/local/rubies/ruby-3.3.0/bin/ruby
# -*- ruby -*-
Encoding.default_internal = Encoding::ISO_8859_1
require 'optparse'
parser = OptionParser.new {
}
parser.parse!
Process.exit(0)
Assume this code is in a file called rtest.rb
, and assume that LC_TYPE
and LC_CTYPE
are set to 'UTF-8' in the environment. And assume that I have a file in my current directory whose name is encoded as a ISO-8859-1
string. Then, I run the script as follows:
./rtest.rb *
What results is this error:
/opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1640:in `===': invalid byte sequence in UTF-8 (ArgumentError)
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1640:in `block in parse_in_order'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1636:in `catch'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1636:in `parse_in_order'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1630:in `order!'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1739:in `permute!'
from /opt/local/rubies/ruby-3.3.0/lib/ruby/3.3.0+0/optparse.rb:1764:in `parse!'
from ./rtest.rb:10:in `<main>'
FURTHER UPDATE: I completely unset all of the LC_*
environment variables, and I then re-ran. I got the same error.
First, you should file a bug to the Ruby devs. OptionParser
shouldn't crash because of encoding problems when dealing with paths. A path can contain any byte (excluding the NUL
byte), so OptionParser::!parse
shall always deal with them as ASCII-8BIT
, independently of the encoding that you choose for the program.
edit: As a workaround, I would recommend to override the encoding of the arguments manually:
ARGV.map! { |s| s.dup.force_encoding(Encoding::ASCII_8BIT).freeze }
Second, the only way that I found for setting the internal encoding of the ruby interpreter "from inside" script.rb
is to add an option to the shebang. Though it'll only work when calling the script as an executable.
#!/usr/bin/ruby -EASCII-8BIT
warning: a shebang only accepts one argument; for example #!/usr/bin/env ruby -EASCII-8BIT
doesn't work
Now, for passing the encoding as an argument to the script, there's no other way than to wrap the program in an other script. Here's an example with a shell script:
#!/bin/sh
cat <<'EOF' | ruby "$@"
require 'optparse'
parser = OptionParser.new { }
parser.parse!
Process.exit(0)
EOF
Then you can call it like this:
./script.sh -EASCII-8BIT -- filename_with_non_utf8_characters ...