Search code examples
parsingsyntax-highlightingtextmatesublimetext

Where can I find a library that parses source code and is able to extract the scope of where your cursor is currently in the code?


In SublimeText(2), when you press [ctrl + shift + p] (mac osx) you are shown a scope of where your caret/cursor is in the source code at the given moment e.g.: entity.name.tag.inline.any.html meta.tag.inline.any.html text.html.basic

I am curious about what library or script is used to parse the document/file and create that scope string.

A sidenote: Typing view.syntax_name(view.sel()[0].b) into Sublime's console will output the scope as well.


Solution

  • well, the "libraries" that you are referring to are just the language grammars.

    Indeed, language grammars are just rule sets for divindg a document's syntax into scopes.

    In other words, each rule in a grammar just assigns one or more elements of syntax to a scope.

    w/r/t the the actual parsing, for TextMate and Sublime Text, it is done by regular expressions.

    For instance, consider the python language grammar; As you know, when i put my cursor at the beginning of while, and do ctrl-shift-P (show scope), the scope will appear in the status bar:

    source.python.keyword.control.flow.python
    

    Again, this scope is defined in the python language grammar, so we can find the specific rule:

    { 
      match = '\b(elif|else|except|finally|for|if|try|while|with|break|continue|pass|raise|return|yield)\b';
    
      name = 'keyword.control.flow.python';
    }
    

    The first item, match is just the pattern that is passed into the parser.

    The second item, name is just the name given to that particular syntax element (i.e., the scope).

    Sublime Text 2 stores language grammars in the Packages directory with the extension tmLanguage as XML rather than JSON, so the rule above from the python language grammar looks like this in ST2:

    <dict>
        <key>
            match
        </key>
        <string>
            \b(elif|else|except|finally|for|if|try|while|with|break|....[truncated]
        </string>
    
        <key>
            name
        </key>
        <string>
            keyword.control.flow.python
        </string>
    </dict>
    

    So every rule in the grammar is wrapped in a pair of dict tags, and each rule's regexp pattern is wrapped in string tags; ditto for its corresponding scope name.

    All of the language grammars are comprised entirely of these rules--with precisely this match/name (or pattern/scope) structure.