Search code examples
sedlocalizationgettextdata-conversionicu

Convert .po file into ICU4C .txt file


I've tried to create an ICU4C file from a gettext .po file with a sed script like this:

/^#/ d                            /* delete comments */
:a;/"$/{N;s/"\n"//;ba}            /* merge quoted lines in loop */
/^msgid /s/msgid (.*)/\1/         /* convert msgids */
s/msgstr "(.*)"/\{ "\1" }/        /* convert msgstrs */

and it already works pretty well (ignoring plural forms), but for some reason it doesn't convert the last msgid/msgstr couple, unless I don't merge the quotes twice. But then the syntax for the other stuff becomes wrong. Any ideas? Doesn't have to use sed.

Those ICU files are the only ones accepted by genrb, and I'd like to use the ResourceBundle in PHP.


Solution

  • I've accomplished my goal through a shell script. Here's the rough idea:

    #!/usr/bin/env bash
    
    # remove comments
    sed -r -e '/^#/ d' < de.po >de.icu.txt
    # merge strings
    sed -i de.icu.txt -r -e ':L;/"$/{N;s/"\n"//;b L}'
    # delete gettext header
    sed -i -e '1,2 d' de.icu.txt
    # convert into ICU format
    sed -i de.icu.txt -r -e '
    # delete untranslated
    /msgid ".+"/{
        N
        /msgstr ""/{
            N;s/msgid ".+"\nmsgstr ""\n//
        }
    }
    # generate ICU txt
    /msgid /s/msgid (.*)/\1/
    s/msgstr "(.*)"/\{ "\1" }/'
    sed -i -e '1i de {' -e '$ a\\n}' de.icu.txt
    

    There's probably a nicer way, but it does the job.