Search code examples
bashawksedxmlstarlet

Bash - Search for a specific string in a file and replace with immediate source


I am trying to search for the string "missing" in a file containing the following:

<message>
    <source>TypeA</source>
    <translation>missing</translation>
</message>
<message>
    <source>TypeB</source>
    <translation>missing</translation>
</message>
<message>
    <source>TypeC</source>
    <comment>Context menu</comment>
    <translation>missing</translation>
</message>

And if the "missing" is found, I want to replace the string with it's immediate source name. For example this:

<message>
    <source>TypeA</source>
    <translation>TypeA</translation>
</message>
<message>
    <source>TypeB</source>
    <translation>TypeB</translation>
</message>
<message>
    <source>TypeC</source>
    <comment>Context menu</comment>
    <translation>TypeC</translation>
</message>

I was able to use awk to search for the string and print immediate source name so far:

match($0, /<source>(.*)<\/source>/,n){ src=n[1] }
match($0, /<translation>(.*)<\/translation>/,s){ trs=s[1] }
/unfinished/{ print "Translation missing or incomplete for: '" trs "'","located inside source named: '" src "'" }

And then save it as something.awk call it using:

awk -f something.awk filelocation

But I am not sure how to replace the string "missing" with the value from source.

Can anyone suggest me how can I replace it?


Solution

  • You can try this (write this in something.awk):

    {
        if($0 ~ "<source>"){
                source = gensub(/.*<source>(.*)<\/source>.*/, "\\1", "", $0)
        }
        if($0 ~ "<translation>missing"){
                $0 = gensub(/>.*</, ">" source "<", "", $0)
        }
         print
    }
    

    I don't know if you need a specific version of awk to use gensub... (maybe gawk ?). But it works on my computer when I do :

    awk -f something.awk filelocation
    

    Result :

    <message>
        <source>TypeA</source>
        <translation>TypeA</translation>
    </message>
    <message>
        <source>TypeB</source>
        <translation>TypeB</translation>
    </message>
    <message>
        <source>TypeC</source>
        <comment>Context menu</comment>
        <translation>TypeC</translation>
    </message>
    

    As I said, it can be a serious problem if order of tags is not respected (or if you have multiple tags per line, ...). It is not a big deal if you have a tag between source and translation but source must be before translation. If it is not the case, you may need to parse your file with a correct XML parser tool (awk iksn't) and do your changes and print to file.