Search code examples
regexparsingeditorsyntax-highlightinglexical-analysis

What are alternatives to regexes for syntax highlighting?


While editing this and that in Vim, I often find that its syntax highlighting (for some filetypes) has some defects. I can't remember any examples at the moment, but someone surely will. Usually, it consists of strings badly highlighted in some cases, some things with arithmetic and boolean operators and a few other small things as well.

Now, vim uses regexes for that kinda stuff (its own flavour).

However, I've started to come across editors which, at first glance, have syntax highlighting better taken care of. I've always thought that regexes are the way to go for that kind of stuff.

So I'm wondering, do those editors just have better written regexes, or do they take care of that in some other way ? What ? How is syntax highlighting taken care of when you want it to be "stable" ? And in your opinion what is the editor that has taken care it the best (in your editor of choice), and how did he do it (language-wise) ?

Edit-1: For example, editors like Emacs, Notepad2, Notepad++, Visual Studio - do you perchance know what mechanism they use for syntax highlighting ?


Solution

  • The thought that immediately comes to mind for what you'd want to use instead of regexes for syntax highlighting is parsing. Regexes have a lot of advantages, but as we see with vim's highlighting, there are limits. (If you look for threads about using regexes to analyze XML, you'll find extensive material on why regexes can't do what parsers do.)

    Since what we want from syntax highlighting is for it to follow the syntactic structure of the language, which regexes can only approximate, you need to perform some level of real parsing to go beyond what regexes can do. A simple recursive descent lexer will probably do great for most languages, I'm thinking.