Search code examples
gettext

Is there a command to flag and clean different messages that have the same translation in a po file?


Is there a command to flag and clean different messages that have the same translation in a GetText po file ?

#: templates/translations.html:7161
msgid "Straightedges"
msgstr "Règles de précision"

#: templates/translations.html:11697
msgid "Straight hemostats"
msgstr "Règles de précision"

Is there a way to wipe all the translations in that case ?


Solution

  • You can use the following Perl script for that task:

    #! /usr/bin/env perl
    
    use strict;
    
    use Locale::PO;
    
    die "usage: $0 POFILE\n" unless $ARGV[0];
    
    binmode 'STDOUT', ':utf8';
    my $entries = Locale::PO->load_file_asarray($ARGV[0], 'UTF-8')
        or die "$ARGV[0]: $!\n";
    
    my %seen;
    
    foreach my $entry (@$entries) {
        ++$seen{$entry->dequote($entry->msgstr)};
    }
    foreach my $entry (@$entries) {
        my $msgstr = $entry->dequote($entry->msgstr);
        #next if $seen{$msgstr} > 1;
        $entry->msgstr("") if $seen{$msgstr} > 1;
        print $entry->dump;
    }
    

    You need the Perl library Locale-PO for that. You can install it with one the command sudo perl -MCPAN -e 'install Locale::PO'. Omit the sudo if you don't need it.

    If you really want to delete the entries with duplicate translations, then uncomment the line with next. My version just discards the translation which is most probably what you really want.

    The solution oversimplifies a little bit. It does not support entries with plural forms or message contexts but you probably don't need them anyway.