Search code examples
rubystringsymbols

Ruby split string and preserve separator


In Ruby, what's the easiest way to split a string in the following manner?

  • 'abc+def' should split to ['abc', '+', 'def']

  • 'abc\*def+eee' should split to ['abc', '\*', 'def', '+', 'eee']

  • 'ab/cd*de+df' should split to ['ab', '/', 'cd', '*', 'de', '+', 'df']

The idea is to split the string about these symbols: ['-', '+', '*', '/'] and also save those symbols in the result at appropriate locations.


Solution

  • Option 1

    /\b/ is a word boundary and it has zero-width, so it will not consume any characters

    'abc+def'.split(/\b/)
    # => ["abc", "+", "def"]
    
    'abc*def+eee'.split(/\b/)
    # => ["abc", "*", "def", "+", "eee"]
    
    'ab/cd*de+df'.split(/\b/)
    # => ["ab", "/", "cd", "*", "de", "+", "df"]
    

    Option 2

    If your string contains other word boundary characters and you only want to split on -, +, *, and /, then you can use capture groups. If a capture group is used, String#split will also include captured strings in the result. (Thanks for pointing this out @Jordan) (@Cary Swoveland sorry, I didn't see your answer when I made this edit)

    'abc+def'.split /([+*\/-])/
    # => ["abc", "+", "def"]
    
    'abc*def+eee'.split /([+*\/-])/
    # => ["abc", "*", "def", "+", "eee"]
    
    'ab/cd*de+df'.split /([+*\/-])/
    # => ["ab", "/", "cd", "*", "de", "+", "df"]
    

    Option 3

    Lastly, for those using a language that might not support string splitting with a capture group, you can use two lookarounds. Lookarounds are also zero-width matches, so they will not consume any characters

    'abc+def'.split /(?=[+*\/-])|(?<=[+*\/-])/
    # => ["abc", "+", "def"]
    
    'abc*def+eee'.split /(?=[+*\/-])|(?<=[+*\/-])/
    # => ["abc", "*", "def", "+", "eee"]
    
    'ab/cd*de+df'.split /(?=[+*\/-])|(?<=[+*\/-])/
    # => ["ab", "/", "cd", "*", "de", "+", "df"]
    

    The idea here is to split on any character that is preceded by one of your separators, or any character that is followed by one of the separators. Let's do a little visual

    ab ⍿ / ⍿ cd ⍿ * ⍿ de ⍿ + ⍿ df

    The little symbols are either preceded or followed by one of the separators. So this is where the string will get cut.


    Option 4

    Maybe your language doesn't have a string split function or sensible ways to interact with regular expressions. It's nice to know you don't have to sit around guessing if there's clever built-in procedures that magically solve your problems. There's almost always a way to solve your problem using basic instructions

    class String
      def head
        self[0]
      end
      def tail
        self[1..-1]
      end
      def reduce acc, &f
        if empty?
          acc
        else
          tail.reduce yield(acc, head), &f
        end
      end
      def separate chars
        res, acc = reduce [[], ''] do |(res, acc), char|
          if chars.include? char
            [res + [acc, char], '']
          else
            [res, acc + char]
          end
        end
        res + [acc]    
      end
    end
    
    'abc+def'.separate %w(- + / *)
    # => ["abc", "+", "def"]
    
    'abc*def+eee'.separate %w(- + / *)
    # => ["abc", "*", "def", "+", "eee"]
    
    'ab/cd*de+df'.separate %w(- + / *)
    # => ["ab", "/", "cd", "*", "de", "+", "df"]