Search code examples
regexbashmacosxpathxidel

Creating an alias for xpath expression in xidel with regex and bash


If you have already used Xidel, you will often need to locate nodes that have a certain class. To do this more easy, I want to create has-class("class") function that serves as an alias for the expression:
contains(concat(" ", normalize-space(@class), " "), " class ").

Example:

$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'

e-xidel.sh contains this code:

#!/bin/bash

echo -e "$(tput setaf 2) Checking... $(tput sgr0)"

path=$1
expression=$2

# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'

xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")

echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"

Solution

  • You can use sed (GNU version, cannot guarantee it will work with others implementations) to achieve your need:

    sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " \1 ")/g'
    

    Explanation:

    • s/pattern/substitution/g: replace the portion matching the pattern by the substitution string; g flag for replace all the portions of line (global substitution)
    • has-class("\([^)]\+\)"): a portion starting with has-class(" containing any character except the closing parenthesis ([^)]) and ending by "). Escaped parentheses surrounding the inner part capture the subportion and associate it with the alias \1, since it's the first created capture group.
    • contains(concat(" ", normalize-space(@class), " "), " \1 "): replace the mached portion by this text; \1 will be expanded by the content of the associated captured group.

    Your script would be:

    #!/bin/bash
    
    function expand-has-class() {
        echo "$1" |
        sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " \1 ")/g'
    }
    
    echo -e "$(tput setaf 2) Checking... $(tput sgr0)"
    
    path=$1
    expression="$(expand-has-class "$2")"
    
    # expression = '//article/p//img[has-class("wp-image")]'
    # Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
    # ...
    # ...
    # expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'
    
    xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")
    
    echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"