Search code examples
regexpreg-match-all

Regex: Match string with OR withour string delimiter


I'm extracting from bibtex and have a little problem, as the format can have values wrapped inside curly brackets OR NOT. Please find the example text below:

@article{Roxas_2011, title={Social Desirability Bias in Survey Research on Sustainable Development in Small Firms: an Exploratory Analysis of Survey Mode Effect}, volume={21}, ISSN={1099-0836}, url={http://dx.doi.org/10.1002/bse.730}, DOI={10.1002/bse.730}, number={4}, journal={Business Strategy and the Environment}, publisher={Wiley}, author={Roxas, Banjo and Lindsay, Val}, year={2011}, month=sep, pages={223\xe2\x80\x93235} }

A you can see, all except month are x={y}, so a simple (PHP preg_match with mUg flags):

[\s,]+(.*)={(.*[^}])}

Does the trick for everything except month=sep.

If I try using ", " as delimited, it aparantly also splits authors. Can you please help me? :)

Thanks :)


Solution

  • You can use

    [\s,]+(.*?)=(?|{([^{}]*)}|(\w+))
    

    Note you should not use any flags with the regex (you may use an s flag to make . match line break chars and you may use u flag to make \w and \s match all Unicode word/whitespace chars - if you need).

    See the regex demo.

    Details

    • [\s,]+ - one or more whitespaces or/and commas
    • (.*?) - Group 1: any zero or more chars other than line break chars as few as possible
    • = - a = char
    • (?|{([^{}]*)}|(\w+)) - a branch reset group matching:
      • {([^{}]*)} - a { char, any zero or more chars other than { and } captured into Group 2, a } char.
      • | - or
      • (\w+) - Group 2: one or more word chars.