Search code examples
regexopenrefine

Regex to delete all caps letters and following comma


I have a csv of names like so Smith, SMITH, John, JOHN and I'm trying to use regex in OpenRefine to remove the names in all caps.

replace(value, /^[A-Z]$/, '') does nothing and replace(value, /[A-Z]/, '') gets rid of all names with any capital letters and leaves a trail of stray commas.

I need to delete the all caps names and any commas that may follow as well. I'm not interested in preserving the list by making all names lower case or capitalizing the first letter of each name. Any name in all caps must be deleted.


Solution

  • Use

    replace(value, /, *[A-Z]+\b/, '')
    

    See proof.

    EXPLANATION

    --------------------------------------------------------------------------------
      ,                        ','
    --------------------------------------------------------------------------------
       *                       ' ' (0 or more times (matching the most
                               amount possible))
    --------------------------------------------------------------------------------
      [A-Z]+                   any character of: 'A' to 'Z' (1 or more
                               times (matching the most amount possible))
    --------------------------------------------------------------------------------
      \b                       the boundary between a word char (\w) and
                               something that is not a word char