PHP gettext reverse translate

My question is quite simple, I use gettext to translate URLs, therefore I only have the translated version of the url string.

I would like to know if there was an easy way to get the base string from the translated string?

What I had in head was to automatically add the translated name in a database and aliases it with the base string each times I use my _u($string) function.

What I have currently:

function _u($string)
{
    if (empty($string))
        return '';
    else
        return dgettext('Urls', $string);
}

What I was thinking about (pseudo-code):

function _u($string)
{
    if (empty($string))
        return '';

    $translation = dgettext('Urls', $string);

    MySQL REPLACE INTO ... base = $string, translation = $translation; (translation = primary key)

    return $translation;
}

function url_base($translation)
{
    $row = SELECT ... FROM ... translation = $translation;

    return $base;
}

Although it doesn't seem to be the best way possible to do this and, if on production I remove the REPLACE part, then I might forget a link or two in production that I might haven't went to.

EDIT: What I am mostly looking for is the parsing part of gettext. I need not to miss any of the possible URLs, so if you have another solution it would be required to have a parser (based on what I'm looking for).

EDIT2: Another difficulty have just been added. We must find the URL in any translations and put it back into the "base" translation for the system to parse the URL in the base language.

Solution

Actually, the most straightforward way I can think of would be to decode the .mo files used for the translation, through a call to the msgunfmt utility.

Once you have the plaintext database, you save it in any other kind of database, and will then be able to do reverse searches.

But perhaps better, you could create additional domain(s) ("ReverseUrlsIT") in which to store the translated URL as key, and the base as value (provided the mapping is fully two-way, that is!).

At that point you can use dgettext to recover the base string from the translated string, provided that you know the language of the translated string.

Update

This is the main point of using gettext and I would drop it anytime if I could find another parser/library/tool that could help with that

The gettext family of functions, after all is said and done, are little more than a keystore database system with (maybe) a parser which is a little more powerful than printf, to handle plurals and adjective/noun inversions (violin virtuoso in English becomes virtuoso di violino in Italian).

At the cost of adding to the database complexity (and load), you can build a keystore leveraging whatever persistency layer you've got handy (gettext is file based, after all):

TABLE LanguageDomain
{
    PRIMARY KEY ldId;
    varchar(?)  ldValue;
}
# e.g.
# 39   it_IT
# 44   en_US
# 01   us_US

TABLE Shorthand
{
    PRIMARY KEY shId;
    varchar(?)  shValue;
}

# e.g.
# 1    CAMERA
# 2    BED

TABLE Translation
{
    KEY t_ldId,
        t_shId;
    varchar(?)  t_Value;   // Or one value for singular form, one for plural...
}

# e.g.
# 44    1    Camera
# 39    1    Macchina fotografica
# 01    1    Camera
# 44    1    Bed
# 39    1    Letto
# 01    1    Bed
# 01  137    Behavior
# 44  137    Behaviour     # "American and English have many things in common..."
# 01  979    Cookie
# 44  979    Biscuit       " "...except of course the language" (O. Wilde)

function translate($string, $arguments = array())
{
    GLOBAL $languageDomain;
    // First recover main string
    SELECT t_Value FROM Translation AS t
        LEFT JOIN LanguageDomain AS l ON (t.ldId = l.ldId AND l.ldValue = :LangDom)
        LEFT JOIN Shorthand      AS s ON (t.t_shId = s.shId AND s.shValue=:String);
    // 
    if (empty($arguments))
        return $Result;
    // Now run replacement of arguments - if any
    $replacements = array();
    foreach($arguments as $n => $argument)
        $replacements["\${$n}"] = translate($argument);
    // Now replace '$1' with translation of first argument, etc.
    return str_replace(array_keys($replacements), array_values($replacements), $Result);
}

This would allow you to easily add one more languageDomain, and even to run queries such as e.g. "What terms in English have not yet been translated into German?" (i.e., have a NULL value when LEFT JOINing the subset of Translation table with English domain Id with the subset with German domain Id).

This system is inter-operable with POfiles, which is important if you need to outsource the translation to someone using the standard tools of the trade. But you can as easily output a query directly to TMX format, eliminating duplicates (in some cases this might really cut down your translation costs - several services overcharge for input in "strange" formats such as Excel, and will either overcharge for "deduping" or will charge for each duplicate as if it was an original).

<?xml version="1.0" ?>
<tmx version="1.4">
        <header
                creationtool="MySQLgetText"
                creationtoolversion="0.1-20120827"
                datatype="PlainText"
                segtype="sentence"
                adminlang="en-us"
                srclang="EN"
                o-tmf="ABCTransMem">
        </header>
        <body>
                <tu tuid="BED" datatype="plaintext">
                        <tuv xml:lang="en">
                                <seg>bed</seg>
                        </tuv>
                        <tuv xml:lang="it">
                                <seg>letto</seg>
                        </tuv>
                </tu>
                <tu tuid="CAMERA" datatype="plaintext">
                        <tuv xml:lang="en">
                                <seg>camera</seg>
                        </tuv>
                        <tuv xml:lang="it">
                                <seg>macchina fotografica</seg>
                        </tuv>
                </tu>
        </body>
</tmx>