Search code examples
rubyregexposix

How can I apply regexp POSIX class subtraction or equivalent?


Trying to do this in ruby but I suppose the question is applicable in any regexp language that handles POSIX classes

Goal: I want to replace all of the characters that match the [[:space:]] POSIX class except a tab, with a regular single space.

Hoping character class subtraction would work with POSIX classes, I tried this but it doesn't work

value.gsub!(/[ [[:space:]] - [\t] ]/, ' ')

Is there a way to rewrite this so I can match and replace any of the characters found in the [[:space:]] class except the tab with a single regular space character?

Update

Thanks for all of the answers.

The answer I was searching for and defined in my question focused on [[:space:]] POSIX class because this class extends beyond just ascii characters and control characters and includes irregular or otherwise non-ascii whitespace equivalents in unicode, etc. Therefore, while I agree I could go and build my own class and find each and every possible whitespace, I'd rather make use of the existing class defined to include those and remove what I want from it.

Initial testing shows that these 3 below answers provided:

value.gsub!(/(?!\t)[[:space:]]/, ' ')  # appears to be language agnostic regexp approach which is good if needed

value.gsub!(/[[:space:]&&[^\t]]/, ' ') # for languages that don't actually support true class subtraction 

value.gsub!(/[^[:^space:]\t]/, ' ') # inverse or double negative approach

produce the desired results. I like the first two best, but since I originally framed the question with ruby and the answer points out that ruby doesn't actually support class subtraction but instead demonstrates an intersection with a negative, I am choosing that answer for this question because it seems this is good to know even with non-POSIX classes.


Solution

  • You may use

    /[[:space:]&&[^\t]]/
    

    See the Rubular demo

    Details

    • [ - start of a character class (bracket expression)
      • [:space:] - a POSIX character class matching whitespace chars
      • && - a character class intersection operator
      • [^\t] - any char other than a tab
    • ] - end of a character class (bracket expression).

    See more about how to use character class subtraction in Ruby.