I must convert regularly a file which contains uft-8-mac strings to uft-8. I started to do the job with iconv. How ever iconv throws an error, if there was too many lines which must be converted. Here is a script to reproduce the bug
#!/bin/zsh
set -eu
for i in {1..1000}; do
echo "$i:äöüß@€" >> /tmp/xx
iconv -f utf-8-mac -t utf-8 /tmp/xx > /dev/null
done
Obviously I can split the file, but I get then really a lot of files.
Has anyone another workaround or tool? Or a code example in golang?
I tried
func main() {
dat, err := os.ReadFile(".backup_files.unconv")
if err != nil {
log.Fatal(err)
}
output := ".backup_files.goconv"
w, err := os.Create(output)
if err != nil {
log.Fatalf("Can't create %s, %v", output, err)
}
defer closeFile(w)
wc := norm.NFC.Writer(w)
defer wc.Close()
wc.Write(dat)
}
But it differs from iconv result. Thanks in advance.
Found an appropriate solution:
You can use the uconv
utility from ICU. Normalization is achieved through transliteration (-x
).
On Debian, Ubuntu and other derivatives, uconv
is in the libicu-dev
package. On Fedora, Red Hat and other derivatives, and in BSD ports, it's in the icu
package.
Thanks to Gilles 'SO- stop being evil'