I have a C++ program which reads text files. Currently I'm using C's fopen()
to read and then fgetc()
to read the next character.
I typedef
'd a "file character", which is actually an int
(and I can change it to long
without problems, obviously).
Now the program can read UTF-7 and UTF-8 text files, but what if I use UTF-16 or UTF-32 text files? Is there a way to infer the file encoding and then read the file properly?
Even passing to C++'s istream
's wouldn't be a problem.
While you cannot definitely infer, in practice, you might try-and-fail based on a list of encodings.
Additionally, utf files are permitted (but not required) to store a byte order mark: https://en.wikipedia.org/wiki/Byte_order_mark . If you have it, you are lucky, as that's different amongst the encodings.