Search code examples
pcre

Regexp to match options which are delimited by spaces and also have spaces in their content


I am parsing a Wordpress shortcode and want to use PCRE mainly with a view to finally getting my head around it.

The following shortcode is one I wish to parse:

[testing att1='hello' att2='hello again' att3='£100']

My current regexp is:

\s?([a-z0-9_]*='[[:graph:]\£]*')\s?

This matches att1 and att3 but not att2 due to the fact that it has whitespace in it. However when I amend my regexp to:

\s?([a-z0-9_]*='[[:graph:]\s\£]*')\s?   --- note the '\s' after [:graph:]

It matches from 'att1' to 'att3' in its entirety i.e. att1='hello' att2='hello again' att3='£100'. How do I match att2 to include the whitespace and also preserve the fact that whitespace is a delimeter.

I think my issue is how I am stating how the group is terminated but unsure!


Solution

  • If you want to match attributes with single-quoted arguments you can use

    \w+='[^']*'
    

    See the regex demo. Details:

    • \w+ - one or more letters, digits or underscores
    • =' - a =' string
    • [^']* - zero or more chars other than '
    • ' - a ' char.