Search code examples
phpregexreplacesanitization

Remove non-letters and letters not preceded by specified characters or at the start of the string


I'm having a bit of difficulties converting some regex from being used in preg_match_all() to being used in preg_replace().

Basically, via regex only, I would like to match uppercase characters that are preceded by either a space, beginning of text, or a hypen. This is not a problem, I have the following for this which works well:

preg_match_all('/(?<= |\A|-)[A-Z]/',$str,$results);
echo '<pre>' . print_r($results,true) . '</pre>';

Now, what I'd like to do, is to use preg_replace() to only return the string with the uppercase characters that match my criteria above. If I port the regex straight into preg_replace(), then it obviously replaces the characters I want to keep.

Also, I'm fully aware regex isn't the best solution for this in terms of efficiency, but nonetheless I would like to use preg_replace().


Solution

  • According to De Morgan's laws,
    if you want to keep letters that are

    • A-Z, and
    • preceded by [space], \A, or -

    then you'd want to remove characters that are

    • not A-Z, or
    • not preceded by [space], \A, or -

    Perhaps this (replace match with empty string)?

    /[^A-Z]|(?<! |\A|-)./
    

    See example here.