Search code examples
regexrubysublimetext3syntax-highlighting

Sublime Text is messing with Ruby syntax highlight when using Ruby regexes


The ruby syntax highlighting is not working properly when using regexes.

  • Here is the ruby syntax highlighting issue:

enter image description here

It looks like multiple issues are happening here.

  1. It seems that it interprets string interpolations inside regexes as a comment (#) and messes up the whole syntax highlighting from that point on on that line.
  2. It seems that the combination of " and ' in the line of the string_literal is messing up from that point on until the end of the file. Which is much more serious.
  • Here is the example as a code:
class Tokenizer
  def initialize(expression)
    @expression = expression
  end

  TOKEN_REGEX = /
    (?<whitespace>\s+) |
    (?<parenthesis>[\(\)]) |
    (?<comparison_operator>#{ComparisonNode::OPERATORS.map { |op| Regexp.escape(op) }.join('|')}) |
    (?<logical_operator>\b(?:#{LogicalNode::OPERATORS.join('|')})\b) |
    (?<boolean_literal>\b(?:#{ValueNode::BOOLEAN_LITERALS.join('|')})\b) |
    (?<number_literal>\d+) |
    (?<string_literal>"[^"]*"|'[^']*') |
    (?<identifier>[a-z_][a-z0-9_\.]*) |
    (?<unknown>.)
  /ix.freeze

  def tokenize
    tokens = []
    @expression.scan(TOKEN_REGEX) do
      match_data = Regexp.last_match
      if match_data[:whitespace]
        next
      elsif match_data[:parenthesis]
        tokens << Token.new(:parenthesis, match_data[0])
      elsif match_data[:comparison_operator]
        tokens << Token.new(ComparisonNode::TYPE, match_data[0])
      elsif match_data[:logical_operator]
        tokens << Token.new(LogicalNode::TYPE, match_data[0].upcase)
      elsif match_data[:boolean_literal]
        tokens << Token.new(:literal, match_data[0].downcase)
      elsif match_data[:number_literal]
        tokens << Token.new(:literal, match_data[0])
      elsif match_data[:string_literal]
        value = match_data[0][1...-1] # Remove surrounding quotes
        tokens << Token.new(:literal, value)
      elsif match_data[:identifier]
        tokens << Token.new(FieldNode::TYPE, match_data[0])
      else
        raise "Unexpected character: #{match_data[0]}"
      end
    end
    tokens
  end
end

Initially, this is happening with the builtin ruby syntax highlight from the Sublime Text 3 (Version 3.2.2, Build 3211). I tried to install ruby syntax highlighting specific packages that tries to fix this issue, such as Sublime Better Ruby, but without success.

Is there someone with the same issue? If so, how did you fix it? Thanks!


Solution

  • Sublime Text Ruby Syntax takes an opinionated view that multi-line Regexps generally use the %r literal syntax.

    So using / / only works correctly if the leading and trailing forward slash are on the same line.

    As shown in Ruby.sublime-syntax. I linked v3211 because that is your stated version but the same applies to all versions before and up through v4108. It appears this was patched in v4109

     try-regex:
        # Generally for multiline regexes, one of the %r forms below will be used,
        # so we bail out if we can't find a second / on the current line
        - match: '\s*(/)(?![*+{}?])(?=.*/)'
          captures:
            1: string.regexp.classic.ruby punctuation.definition.string.ruby
          push:
            - meta_content_scope: string.regexp.classic.ruby
            - match: "(/)([eimnosux]*)"
              scope: string.regexp.classic.ruby
              captures:
                1: punctuation.definition.string.ruby
                2: keyword.other.ruby
              pop: true
            - include: regex-sub
        - match: ''
          pop: true
    

    Knowing this you can alter your code to:

    TOKEN_REGEX = %r{
        (?<whitespace>\s+) |
        (?<parenthesis>[\(\)]) |
        (?<comparison_operator>#{ComparisonNode::OPERATORS.map { |op| Regexp.escape(op) }.join('|')}) |
        (?<logical_operator>\b(?:#{LogicalNode::OPERATORS.join('|')})\b) |
        (?<boolean_literal>\b(?:#{ValueNode::BOOLEAN_LITERALS.join('|')})\b) |
        (?<number_literal>\d+) |
        (?<string_literal>"[^"]*"|'[^']*') |
        (?<identifier>[a-z_][a-z0-9_\.]*) |
        (?<unknown>.)
      }ix.freeze
    

    and the syntax highlighting works as expected.

    enter image description here

    As an aside Regexp::union provides a means for unioning an Array of values so you don't need to manually join or escape. This means you could just use:

       (?<comparison_operator>#{Regexp.union(ComparisonNode::OPERATORS)}) | 
       (?<logical_operator>\b(?:#{Regexp.union(LogicalNode::OPERATORS)})\b) |
       (?<boolean_literal>\b(?:#{Regexp.union(ValueNode::BOOLEAN_LITERALS)})\b) |