Search code examples
linuxbashtrim

Trimming Down Output


I am trying to trim down an output in some code I'm working on, and for whatever reason can't get it to work.

version= wget --output-document=- https://dolphin-emu.org/download 2>/dev/null \ | grep 'version always-ltr' -m 1
until [[ "${version::2}" == "." ]];
    do version= echo "$version" | sed 's/^.//'
done
until [[ "${version: -1}" -ge "0" ]];
    do version= echo "$version" | sed 's/.$//'
done
echo $version

Initially, $version equals something long and clunky:

<td class="version always-ltr"><a href="/download/dev/8ecfa537a242de74d2e372e30d9d79b14584b2fb/">5.0-16101</a></td>

However, I only want the 5.0-xxxxx number. How do I do that? (Or what absolutely idiotic mistake am I making?)


Solution

  • If as you show your version is of the form:

    version='<td class="version always-ltr"><a href="/download/dev/8ecfa537a242de74d2e372e30d9d79b14584b2fb/">5.0-16101</a></td>'
    

    A simple sed expression capturing the wanted value and reinserting as the first backreference is all that is needed, e.g.

    $ echo "$version" | sed 's/^.*">\([^<][^<]*\).*$/\1/'
    5.0-16101
    

    Where you can rely on the greedy match from the beginning of the string to the final "> and then capture the wanted text with \([^<][^<]*\) and then reinsert it as the substituted text with \1.

    To capture in a variable, just use command substitution, e.g. var=$(command), e.g.

    ver=$(echo "$version" | sed 's/^.*">\([^<][^<]*\).*$/\1/')
    

    Note: processing html should be done with an html/xml aware application like xmllint or xmlstarlet. There are far too many variations and caveats in what you may get back with curl to rely solely on shell processing to extract data consistently.