I am trying to identify file encoding in Ruby.
file = File.open("filePath", "rw");
file.encoding;
But I cannot get UTF-8-BOM encoding even if my file is in such encoding. I get everything but not UTF-8-BOM. Might UTF-8-BOM encoding be unsupported in ruby? I dont need open or read file but identify its encoding type.
What do you mean by “UTF-8-BOM” encoding? It’s, in fact, a plain old good “UTF-8” encoding, just prepended with the byte order mark (EF BB BF
.) BOM has no effect on UTF-8. Using of BOM in UTF-8 is not recommended. The summing up: there is no such encoding, there is byte order mark that is supposed to help to identify the endianness of the encoding.
File.open
is the general purpose stream reader and it does not suggest anything, it might be told to use an explicit encoding (useful for single-byte encodings,) and it might determine the endianness of an encoding, basing on BOM (useful for fixed-width unicode encodings.)
If you want to check that the file has BOM, read 3 bytes from it and compare them against EF BB BF
.