Search code examples
regexparsing

Regex to match copyright statements


I don't know much of regex, and I'm trying to find a pattern that allows me to match copyright statements such as:

'Copyright © 2019 Company All Rights Reserved'
'© 2019 Company All Rights Reserved'
'© 2019 Company'

And as many other combinations as possible.

I found this regex pattern in https://github.com/regexhq/copyright-regex/blob/master/index.js

/(?!.*(?:\{|\}|\);))(?:(copyright)[ \t]*(?:(©|\(c\)|&#(?:169|xa9;)|©)[ \t]+)?)(?:((?:((?:(?:19|20)[0-9]{2}))[^\w\n]*)*)([ \t,\w]*))/i

I was trying it here https://regex101.com/ and while it works with 'Copyright © 2019 Company All Rights Reserved', it doesn't work with '© 2019 Company All Rights Reserved'. How can I change it for it to also match when the word Copyright is not there?


Solution

  • I think that pattern can be simplified for your example data because it contains superfluous grouping structures and you might omit the negative lookahead at the start the asserts that the string does not contain {, } or );

    (?:copyright[ \t]*)?(?:©|\(c\)|&#(?:169|xa9;)|©)[ \t]+(?:19|20)[0-9]{2} Company(?: All Rights Reserved)?
    

    Regex demo

    You can extend the pattern to your requirements.

    That will match

    • (?: Non capturing group
      • copyright[ \t]* Match copyright, match 0+ times a space or tab
    • )? Close non capturing group and make it optional
    • (?: Non capturing group
      • ©|\(c\)|&#(?:169|xa9;)|© Match any of the listed items in the alternation
    • )[ \t]+ Close non capturing group and match 1+ times a space or tab
    • (?:19|20)[0-9]{2} Company match 9 or 20 followed by 2 digits
    • (?: All Rights Reserved)? Optionally match All Rights Reserved