Search code examples
macosutf-8iconv

How to effectiv convert utf-8-mac to utf


I must convert regularly a file which contains uft-8-mac strings to uft-8. I started to do the job with iconv. How ever iconv throws an error, if there was too many lines which must be converted. Here is a script to reproduce the bug

#!/bin/zsh
set -eu

for i in {1..1000}; do
  echo "$i:äöüß@€" >> /tmp/xx
  iconv -f utf-8-mac -t utf-8 /tmp/xx > /dev/null
done

Obviously I can split the file, but I get then really a lot of files.

Has anyone another workaround or tool? Or a code example in golang?

I tried

func main() {
    dat, err := os.ReadFile(".backup_files.unconv")
    if err != nil {
        log.Fatal(err)
    }
    output := ".backup_files.goconv"
    w, err := os.Create(output)
    if err != nil {
        log.Fatalf("Can't create %s, %v", output, err)
    }
    defer closeFile(w)
    wc := norm.NFC.Writer(w)
    defer wc.Close()
    wc.Write(dat)
}

But it differs from iconv result. Thanks in advance.


Solution

  • Found an appropriate solution: You can use the uconv utility from ICU. Normalization is achieved through transliteration (-x).

    On Debian, Ubuntu and other derivatives, uconv is in the libicu-dev package. On Fedora, Red Hat and other derivatives, and in BSD ports, it's in the icu package.

    Thanks to Gilles 'SO- stop being evil'