Search code examples
javafileencodingutf-8ansi

Read ANSI file with polish letters and show in console without Accents


I have this line "ĆćĘ꣏źł" in file.csv, which is encoded (as Notepad++ shows) as ANSI. How can I correctly show this line in console like CcEeLzzl.

For removing accents I'm using StringUtils.stripAccents(myLine) from apache but still got "��Ee����"

        FileReader fr = null;
        try {
            String sCurrentLine;
            br = new BufferedReader(new FileReader(fileName2));
            while ((sCurrentLine = StringUtils.stripAccents(br.readLine())) != null) {
                System.out.println(StringUtils.stripAccents(sCurrentLine));
            }
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (br != null)
                    br.close();
                if (fr != null)
                    fr.close();
            } catch (IOException ex) {
                ex.printStackTrace();
            }
        }```

I want in COnsole this "CcEeLzzl", not that "ĆćĘ꣏źł". Please help me.

Solution

  • Looks like you want to apply a custom mapping from polish letters to ascii which is outside the domain of stripAccents. Probably you have to define it by yourself, e.g. like done below (only shown for "Ł" and "ł").

    Spoiler: no, you don't have to. The ansi on windows encoding was the culprit. With proper decoding StringUtils.stripAccents worked fine. See comments. But if you ever leave stripAccents's domain...

    public void Ll() {
        Map<String, String> map = new HashMap<>();
        map.put("Ł", "L");
        map.put("ł", "l");
    
        System.out.println(Arrays.stream("ŁałaŁała".split("(?!^)"))
                .map(c -> {
                    String letter = map.get(c);
                    return letter == null ? c : letter;
                })
                .collect(Collectors.joining("")));
    }