Search code examples
bashshellsedxmlstarlet

Add leading zero to numbers between special tags


I know it is not complicated to add leading zero to numbers. However, I am looking for an optimal solution to add leading zero only to values between <SpecialTag>0</SpecialTag> to make them 5 digits.

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes" ?>
<Root>
    <Row>
        <Tag1>0</Tag1>
        <SpecialTag>0</SpecialTag>
        <Tag2>0</Tag2>
    </Row>
    <Row>
        <Tag1>0</Tag1>
        <SpecialTag>12</SpecialTag>
        <Tag2>0</Tag2>
    </Row>
    <Row>
        <Tag1>0</Tag1>
        <SpecialTag>12345</SpecialTag>
        <Tag2>0</Tag2>
    </Row>
    <Row>
        <Tag1>0</Tag1>
        <SpecialTag>1234</SpecialTag>
        <Tag2>0</Tag2>
    </Row>
</Root>

Expected results should be like below:

<?xml version="1.0" encoding="ISO-8859-1" standalone="yes" ?>
<Root>
    <Row>
        <Tag1>0</Tag1>
        <SpecialTag>00000</SpecialTag>
        <Tag2>0</Tag2>
    </Row>
    <Row>
        <Tag1>0</Tag1>
        <SpecialTag>00012</SpecialTag>
        <Tag2>0</Tag2>
    </Row>
    <Row>
        <Tag1>0</Tag1>
        <SpecialTag>12345</SpecialTag>
        <Tag2>0</Tag2>
    </Row>
    <Row>
        <Tag1>0</Tag1>
        <SpecialTag>01234</SpecialTag>
        <Tag2>0</Tag2>
    </Row>
</Root>

Solution

  • Using xsltproc (Suggested solution!):

    Having XLST file transform.xsl:

    <?xml version="1.0"?>
    <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    
        <xsl:template match="@*|node()">
            <xsl:copy>
                <xsl:apply-templates select="@*|node()"/>
            </xsl:copy>
        </xsl:template>
    
        <!-- Matches the SpecialTag -->
        <xsl:template match="SpecialTag">
            <xsl:copy>
                <!-- The number is available using node() and format-number() applies the 0-padding -->
                <xsl:value-of select="format-number(node(), '00000')" />
            </xsl:copy>
        </xsl:template>
    
    </xsl:stylesheet>
    

    Run the following, provided that input.xml contains your XML:

    $ xsltproc transform.xml input.xml
    

    Unsafe solutions:

    Those rely on the fact that opening tag <SpecialTag> and closing tag </SpecialTag> are on the same line and that there is only one of them per line.

    The solutions below are only mentioned because the author explicitly tagged the question with //. Those are not the right tools to achieve the job!

    They all work with regular expressions to catch <SpecialTag>, followed by several numbers then </SpecialTag> and transform the numbers caught with a 0-padded version of those numbers.

    Using sed:

    sed --regexp-extended 's@<SpecialTag>([0-9]+)</SpecialTag>@<SpecialTag>0000000\1</SpecialTag>@;s@0*([0-9]{5,})@\1@'
    

    Using perl:

    perl -pe 's@<SpecialTag>([0-9]+)</SpecialTag>@sprintf("<SpecialTag>%05d</SpecialTag>",$1)@e'
    

    Using awk:

    awk '{gsub( /<SpecialTag>[0-9]+<\/SpecialTag>/, sprintf("<SpecialTag>%05d</SpecialTag>", gensub(/[^0-9]/, "","g"))); print}'