Is it possible to specify a range of numbers (1-31) within where I'm matching for a <strong
> tag? The tag in output appears as: <strong>21. Infinite Safari Balls</strong>
.
Edited
#!/bin/bash
wget -q -O - 'goo.gl/vfYA94' | \
sed -En '/<strong>([1-9]|[12][0-9]|3[01])/,/<\/blockquote>/p' | \
sed -e :a -e 's/<[^>]*>//g;/</N;//ba'
As I understand it, you want to print out the block of lines where the first line has the text <strong>NN.
where NN is a number between 1 and 31 and stopping with the next line that contains a </blockquote>
. sed
does not have a good understanding of numbers but you can achieve the effect that you want with regular expressions:
wget -q -O - 'goo.gl/vfYA94' | sed -En '/<strong>([1-9]|[12][0-9]|30|31)\./,/<\/blockquote\>/p'
To reduce the number of backslashes in the regular expression, I used the -E
option for extended regexes. The -E
option is recognized on both Mac OSX and on GNU/Linux although the GNU version of sed
only documents the use of -r
for this purpose.