Search code examples
bashshellescapingwgetxmlstarlet

Unescape the ampersand (&) via XMLStarlet - Bugging &


This a quite annoying but rather a much simpler task. According to this guide, I wrote this:

#!/bin/bash

content=$(wget "https://example.com/" -O -)
ampersand=$(echo '\&')

xmllint --html --xpath '//*[@id="table"]/tbody' - <<<"$content" 2>/dev/null |
    xmlstarlet sel -t \
        -m "/tbody/tr/td" \
            -o "https://example.com" \
            -v "a//@href" \
            -o "/?A=1" \
            -o "$ampersand" \
            -o "B=2" -n \

I successfully extract each link from the table and everything gets concatenated correctly, however, instead of reproducing the ampersand as & I receive this at the end of each link:

https://example.com/hello-world/?A=1\&amp;B=2

But actually, I was looking for something like:

https://example.com/hello-world/?A=1&B=2

The idea is to escape the character using a backslash \& so that it gets ignored. Initially, I tried placing it directly into -o "\&" \ instead of -o "$ampersand" \ and removing ampersand=$(echo '\&') in this case scenario. Still the same result.

Essentially, by removing the backslash it still outputs:

https://example.com/hello-world/?A=1&amp;B=2

Only that the \ behind the &amp; is removed.

Why?

I'm sure it is something basic that is missing.


Solution

  • Sorry I can't reproduce your result but why don't make substitutions? Just filter your results through

    sed 's/\\&amp;/\&/g'
    

    add it to your pipe. It should replace all &amp; to &.