We've got our servers running on CentOS and our Java backend sometimes has to process a file that was originally generated on a Windows machine (by one of our clients) using CP-1252, however in 95%+ use cases, we are processing UTF-8 files.
My question: if we know that certain files will always be UTF-8, and other files will always be CP-1252, is it possible to specify in Java the character set to use for reading in each file? If so:
Thanks in advance!
My question: if we know that certain files will always be UTF-8, and other files will always be CP-1252, is it possible to specify in Java the character set to use for reading in each file?
Assuming you're in charge of the code reading the file, it should be fine. Create a FileInputStream
, then wrap it in an InputStreamReader
specifying the relevant character encoding.
Do we need to do anything at the systems-level for adding CP-1252 to CentOS? If so, what does this involve?
That depends on what the JRE supports. I've never used CentOS, so I don't know whether it's likely to come with the relevant encoding as part of the JRE. You can use Charset.isSupported
to check though, and Charset.availableCharsets
to list what's available.