regex emacs elisp syntax-highlighting font-lock

How to test `font-lock-keywords` values for Emacs Lisp code

I pose the question because I think both the question and possible answers might help Emacs users who write Lisp code that defines font-lock-keywords. I'm providing one answer that I think helps. I'm also interested in other answers.

That variable's value is a list of expressions, each of which can specify one or more patterns to match or functions to perform matching, and one or more faces for highlighting the matching text. The possibilities for font-lock-keywords values are numerous and complicated. (The doc describing this is the Elisp manual, node Search-based Fontification.)

In most cases the list has more than one element, which means more than one regexp pattern. These can interact in different ways. Some can prevent others from taking effect, or they can alter the effect of others. My library Dired+, for instance, defines font-lock-keywords in Dired mode with 31 entries (regexps), many of which interact.

How to keep all of that straight? How do you debug such a list when you are defining it or modifying it? You might comment out all but one of the list items, to see its effect when alone. And then repeat for another. And then perhaps add a few together, and maybe in different orders. There are various possibilities, I suppose, but just what do you do?

(OK, I know that most Elisp coders do not write super complex font-lock-keywords definitions. But even for simple definitions this can become complicated. And perhaps if this process were easier then users would not unnecessarily limit themselves to only one or two entries.)

Solution

You could use my newly released Font Lock Studio. The following is from the readme-file:

font-lock-studio - interactive debugger for Font Lock keywords

Font Lock Studio is an interactive debugger for Font Lock keywords (Emacs syntax highlighting rules).

Introduction

Font Lock Studio lets you single-step Font Lock keywords -- matchers, highlights, and anchored rules, so that you can see what happens when a buffer is fontified. You can set breakpoints on or inside rules and run until one has been hit. When inside a rule, matches are visualized using a palette of background colors. The explainer can describe a rule in plain-text english. Tight integration with Edebug allows you to step into Lisp expressions that are part of the Font Lock keywords.

When using the debugger, an interface buffer is displayed, it contains all the keywords and is used for navigation and visalization of match data.

When Font Lock Studio is started, comments and strings are pre-colored, as they are part of the earlier syntactic phase (which isn't supported by Font Lock Studio).

Start the debugger by typing "M-x font-lock-studio RET". Press ? or see the menu for available commands.

Example

For a buffer using html-mode, the interface buffer looks the following. Other major modes typically have more and more complex rules. The arrow on the left indicates the current active location. A corresponding arrow in the source buffer is placed at the current search location.

        ========================
        === Font Lock Studio ===
        ========================
    --------------------------------------------------
=>  "<\\([!?][_:[:alpha:]][-_.:[:alnum:]]*\\)"
      (1 font-lock-keyword-face)
    --------------------------------------------------
    "</?\\([_[:alpha:]][-_.[:alnum:]]*\\)\\(?::\\([_:[:alpha:]]
    [-_.:[:alnum:]]*\\)\\)?"
      (1
       (if
           (match-end 2)
           sgml-namespace-face font-lock-function-name-face))
      (2 font-lock-function-name-face nil t)
    --------------------------------------------------
    "\\(?:^\\|[ \t]\\)\\([_[:alpha:]][-_.[:alnum:]]*\\)\\(?::
    \\([_:[:alpha:]][-_.:[:alnum:]]*\\)\\)?=[\"']"
      (1
       (if
           (match-end 2)
           sgml-namespace-face font-lock-variable-name-face))
      (2 font-lock-variable-name-face nil t)
    --------------------------------------------------
    "[&%][_:[:alpha:]][-_.:[:alnum:]]*;?"
      (0 font-lock-variable-name-face)
    --------------------------------------------------
    "<\\(b\\(?:ig\\|link\\)\\|cite\\|em\\|h[1-6]\\|rev\\|s\\(?:
    mall\\|trong\\)\\|t\\(?:itle\\|t\\)\\|var\\|[bisu]\\)
    \\([ \t][^>]*\\)?>\\([^<]+\\)</\\1>"
      (3
       (cdr
        (assoc-string
         (match-string 1)
         sgml-tag-face-alist t))
       prepend)
    ==================================================
    Public state:
      Debug on error     : YES
      Debug on quit      : YES
      Explain rules      : YES
      Show compiled code : NO

Press space to single step through all the keywords. "n" will go the the next keyword, "b" will set a breakpoint, "g" will run to the end (or to the next breakpoint) and "q" will quit.

Features

Stepping

You can single step into, over, and out of Font Lock keywords. Anchored rules are fully supported. In addition, you can run to the end or to the next breakpoint.

Breakpoints

You can set breakpoints on part of the keyword, like the matcher (e.g. the regexp), a highlight rule, or inside an anchored highlight rule.

If you want to step or run without stopping on breakpoints, prefix the command with C-u.

Note that in an anchored rule, you can set a breakpoints either on the entire rule or on an individual part. In the former case, only the outer parentheses are highlighted.

Match Data Visualization

After the matcher of a keyword or anchored highlight has been executed, the match data (whatever the search found) is visualized using background colors in the source buffer, in the regexp, and over the corresponding highlight rule or rules. If part of a regexp or a highlight didn't match, it is not colored, this can for example happen when the postfix regexp operator ? is used.

Note that an inner match group gets precedence over an outer group. This can lead to situations where a highlight rule gets a color that doesn't appear in the regexp or in the source buffer. For example, the matcher "\(abc\)" will be colored with the color for match 1, while the higlight rule `(0 a-face)' gets the color for match 0.

Normalized keywords

The keywords presented in the interface have been normalized. For example, instead of

     ("xyz" . font-lock-type-face)

the keyword

      ("xyz" (0 font-lock-type-face))

is shown. See font-lock-studio-normalize-keywords for details.

Explainer

The explainer echoes a human-readble description of the current part of the Font Lock keywords. This help you to understand that all those nil:s and t:s in the rules actually mean.

When using the auto explainer, Font Lock Studio echoes the explanation after each command.

Edebug -- the Emacs Lisp debugger

Tight integration with Edebug allows you to single-step expressions embedded in the keywords in the interface buffer, and it allows you to instrument called functions for debugging in their source file.

Follow mode awareness

The search location in the source buffer is visualized by an overlay arrow and by updating the point. If the source buffer is visible in multiple side-by-side windows and Follow mode is enabled, the search location will be shown in a suitable windows to minimize scrolling.