I am parsing a Wordpress shortcode and want to use PCRE mainly with a view to finally getting my head around it.
The following shortcode is one I wish to parse:
[testing att1='hello' att2='hello again' att3='£100']
My current regexp is:
\s?([a-z0-9_]*='[[:graph:]\£]*')\s?
This matches att1
and att3
but not att2
due to the fact that it has whitespace in it. However when I amend my regexp to:
\s?([a-z0-9_]*='[[:graph:]\s\£]*')\s? --- note the '\s' after [:graph:]
It matches from 'att1' to 'att3' in its entirety i.e. att1='hello' att2='hello again' att3='£100'
. How do I match att2
to include the whitespace and also preserve the fact that whitespace is a delimeter.
I think my issue is how I am stating how the group is terminated but unsure!
If you want to match attributes with single-quoted arguments you can use
\w+='[^']*'
See the regex demo. Details:
\w+
- one or more letters, digits or underscores='
- a ='
string[^']*
- zero or more chars other than '
'
- a '
char.