If you have already used Xidel, you will often need to locate nodes that have a certain class. To do this more easy, I want to create has-class("class")
function that serves as an alias for the expression:
contains(concat(" ", normalize-space(@class), " "), " class ")
.
Example:
$ e-xidel.sh example.com '//article/p//img[has-class("wp-image")]'
e-xidel.sh contains this code:
#!/bin/bash
echo -e "$(tput setaf 2) Checking... $(tput sgr0)"
path=$1
expression=$2
# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'
xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")
echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"
You can use sed
(GNU version, cannot guarantee it will work with others implementations) to achieve your need:
sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " \1 ")/g'
Explanation:
s/pattern/substitution/g
: replace the portion matching the pattern by the substitution
string; g flag for replace all the portions of line (global substitution)has-class("\([^)]\+\)")
: a portion starting with has-class("
containing any character except the closing parenthesis ([^)]
) and ending by ")
. Escaped parentheses surrounding the inner part capture the subportion and associate it with the alias \1
, since it's the first created capture group.contains(concat(" ", normalize-space(@class), " "), " \1 ")
: replace the mached portion by this text; \1
will be expanded by the content of the associated captured group. Your script would be:
#!/bin/bash
function expand-has-class() {
echo "$1" |
sed 's/has-class("\([^)]\+\)")/contains(concat(" ", normalize-space(@class), " "), " \1 ")/g'
}
echo -e "$(tput setaf 2) Checking... $(tput sgr0)"
path=$1
expression="$(expand-has-class "$2")"
# expression = '//article/p//img[has-class("wp-image")]'
# Regex to replace every * has-class("class") * by * contains(concat(" ", normalize-space(@class), " "), " class ") *
# ...
# ...
# expression = '//article/p//img[contains(concat(" ", normalize-space(@class), " "), " wp-image ")]'
xoutput=$(xidel $path --printed-node-format=html --output-declaration= -e "$expression")
echo -e "$(tput setaf 1) $xoutput $(tput sgr0)"