Search code examples
javafile-format

Convert .txt file in UCS-2 file format


I have .txt file and i want to convert this file to UCS-2 format
what should be correct way to convert
File is about 700mb so can not open in Notepad ++ n convert

Please suggest .


Solution

  • OK, so, first of all: Notepad++ shows ANSI, and ANSI is not a character encoding. According to this SO answer and various others, it appears that it is Windows-1252.

    As to UCS-2, it has been superseded by UTF-16 which can encode more code points. Anyway, at the time UCS-2 was defined, it encoded more code points than Windows-1252, so using UTF-16 is OK here.

    However, UTF-16, as USC-2 did, depends on endianness. We will assume little endian here.

    Therefore:

    final Path src = Paths.get("/path/to/original/file.txt")
    final Path dst = Paths.get("/path/to/destination/file.txt");
    
    final char[] buf = new char[1 << 20]; // 1 MB char buffer
    int nrChars;
    
    try (
        final BufferedReader reader = Files.newBufferedReader(src, 
            Charset.forName("windows-1252"));
        final BufferedWriter writer = Files.newBufferedWriter(dst,
            StandardCharsets.UTF_16LE, StandardOpenOption.CREATE);
    ) {
        while ((nrChars = reader.read(buf, 0, buf.length)) != -1)
            writer.write(buf, 0, nrChars);
        writer.flush();
    }
    

    This should work.