Search code examples
javaandroidencodingcharacter-encodingutf-16le

Unable to use encoding UTF-16LE on Android Version 9


I have an application which creates a csv-file. The file is then imported by an excel makro. The makro needs the file to be encoded with UTF-16LE encoding. The problem is, i am not able to use this encoding on some devices.

Until now, i used the charset UTF-16 to create the file. When i opende the file with notepad++ it showed me the file is encoded in UTF-16LE. Now I have a new device and when i create the csv-file with it, notepad++ shows me the encoding is UTF-16BE. As a result, i get an error when i try to import the file with the excel makro.

I tried to specify the encoding as UTF-16LE which should be a valid charset according to the developer page of android. But then notepad++ doesn't recognise the encoding of my file and the excel makro is not able to read it (For the old and the new device).

I am able to convert the encoding in both cases via notepad++ to UTF-16LE and successfully import the file with my makro, but I need to create the file from my app in the correct format.

The older device has android version 5.1

The newer device has android version 9.0

Here is my code:

File file = new File("some_name");
if (file.exists()) {
    file.delete();
}
file.getParentFile().mkdirs();
file.createNewFile();

Writer osw = new OutputStreamWriter(new FileOutputStream(file),"UTF-16LE"); //Or "UTF-16"
osw.write("foo");
osw.write("bar");
osw.close();

How can i use UTF-16LE encoding on the new device?


I did take a look at this answere and implemented it like this:

Writer osw = new OutputStreamWriter(new FileOutputStream(file),"UTF-16LE"); //Or "UTF-16"
osw.write(new String(("foo").getBytes("UTF-16LE"), "UTF-16LE"));
osw.write(new String(("bar").getBytes("UTF-16LE"), "UTF-16LE"));
osw.close();

I also used the StandardCharsets.UTF_16LE but it didn't change anything. The encoding does still not get recognized by notepad++ and doesn't get imported by the makro.


Solution

  • You can try to explicitly force UTF-16LE encoding with a byte order mark (bom) that specifies the encoding. Notepad++ as well as Excel can interpret the bom. As an example, the following code will produce a csv file suitable for import into Excel using a byte order mark:

    public void writeFile(Context context) throws IOException {
        File file = new File(context.getFilesDir(), "some_name.csv");
        if (file.exists()) file.delete();
    
        OutputStreamWriter osw = new OutputStreamWriter(new FileOutputStream(file), Charsets.UTF_16LE);
    
        // The byte order mark for UTF-16 is "\ufeff" but will appear as "\ufffe" when
        // the file is UTF-16LE.
        osw.write("\ufeff");
        osw.write("Header1,Header2,Header3,Header4\n");
        osw.write("text1,text2,text3,text4\n");
        osw.write("text5,text6,text7,text8");
        osw.close();
    }
    

    The resulting file produced on an Android 9 emulator looks like this:

    PS C:\> format-hex some_name.csv
    
               Path: C:\some_name.csv
    
               00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F
    
    00000000   FF FE 48 00 65 00 61 00 64 00 65 00 72 00 31 00  .þH.e.a.d.e.r.1.
    00000010   2C 00 48 00 65 00 61 00 64 00 65 00 72 00 32 00  ,.H.e.a.d.e.r.2.
    00000020   2C 00 48 00 65 00 61 00 64 00 65 00 72 00 33 00  ,.H.e.a.d.e.r.3.
    00000030   2C 00 48 00 65 00 61 00 64 00 65 00 72 00 34 00  ,.H.e.a.d.e.r.4.
    00000040   0A 00 74 00 65 00 78 00 74 00 31 00 2C 00 74 00  ..t.e.x.t.1.,.t.
    00000050   65 00 78 00 74 00 32 00 2C 00 74 00 65 00 78 00  e.x.t.2.,.t.e.x.
    00000060   74 00 33 00 2C 00 74 00 65 00 78 00 74 00 34 00  t.3.,.t.e.x.t.4.
    00000070   0A 00 74 00 65 00 78 00 74 00 35 00 2C 00 74 00  ..t.e.x.t.5.,.t.
    00000080   65 00 78 00 74 00 36 00 2C 00 74 00 65 00 78 00  e.x.t.6.,.t.e.x.
    00000090   74 00 37 00 2C 00 74 00 65 00 78 00 74 00 38 00  t.7.,.t.e.x.t.8.
    

    Notepad++ interprets the file like this:

    enter image description here

    And Excel displays the following after importing the file:

    enter image description here

    This doesn't answer your question directly, but you should be able to use this code with your data to help diagnose the problem.

    I left this in a comment, but I will mention it again here: The UTF decoder behavior did change with Android 9 to become more compliant with Unicode standards. Those changes are described here. Not sure what effect, if any, this would have in your situation, but it is worth taking a look at (IMO).