Search code examples
python-3.xregextkintermarkdowntkinter-text

Markdown Text Highlighting Performance Issues - Tkinter


Overview

I’m trying to add markdown syntax highlighting in a text editor for my project, but I am having some issues with making it user proof so to speak, while being performance friendly

Basically, I'm after this–from Visual Studio Code's markdown:

enter image description here

I’m talking about simple highlighting of bold, italic, lists, etc. to indicate the style that will be applied when the user previews their markdown file.

My Solution

I originally set up this method for my project (simplified for the question and using colours to make the styles clearer for debugging)

import re
import tkinter

root = tkinter.Tk()
root.title("Markdown Text Editor")
editor = tkinter.Text(root)
editor.pack()

# bind each key Release to the markdown checker function
editor.bind("<KeyRelease>", lambda event : check_markdown(editor.index('insert').split(".")[0]))


# configure markdown styles
editor.tag_config("bold",           foreground = "#FF0000") # red for debugging clarity
editor.tag_config("italic",         foreground = "#00FF00") # green for debugging clarity
editor.tag_config("bold-italic",    foreground = "#0000FF") # blue for debugging clarity


# regex expressions and empty tag legnth
search_expressions = {
#   <tag name>    <regex expression>   <empty tag size>
    "italic" :      ["\*(.*?)\*",           2],
    "bold" :        ["\*\*(.*?)\*\*",       4], 
    "bold-italic" : ["\*\*\*(.*?)\*\*\*",   6],
}


def check_markdown(current_line):
    # loop through each tag with the matching regex expression
    for tag, expression in search_expressions.items():
        # start and end indices for the seach area
        start_index, end_index = f"{current_line}.0", f"{current_line}.end"

        # remove all tag instances
        editor.tag_remove(tag, start_index, end_index)
        
        # while there is still text to search
        while 1:
            length = tkinter.IntVar()
            # get the index of 'tag' that matches 'expression' on the 'current_line'
            index = editor.search(expression[0], start_index, count = length, stopindex = end_index, regexp = True)
            
            # break if the expression was not met on the current line
            if not index: 
                break
            
            # else is this tag empty ('**' <- empty italic)
            elif length.get() != expression[1]: 
                # apply the tag to the markdown syntax
                editor.tag_add(tag, index, f"{index}+{length.get()}c")

            # continue searching after the markdown
            start_index = index + f"+{length.get()}c"

            # update the display - stops program freezing
            root.update_idletasks()

            continue

        continue

    return

root.mainloop()

I reasoned that by removing all formatting each KeyRelease and then rescanning the current line, it reduces the amount of syntax being misinterpreted like bold-italic as bold or italic, and tags stacking on top of each other. This works well for a few sentences on a single line, but if the user types lots of text on one line, the performance drops fast, with long waits for the styles to be applied - especially when lots of different markdown syntax is involved.

I used Visual Studio Code's markdown language highlighting as a comparison, and it could handle far more syntax on a single line before it removed the highlighting for "performance reasons".

I understand this is an extremely large amount of looping to be doing every keyReleaee, but I found the alternatives to be vastly more complicated, while not really improving the performance.

Alternative Solutions

I thought, let’s decrease the load. I’ve tested checking every time the user types markdown syntax like asterisks and m-dashes, and doing validation on any tag that has been edited (key release within a tags range). but there are so many variables to consider with the users input– like when text is pasted into the editor, as it is difficult to determine what the effects of certain syntax combinations could have on the surrounding documents markdown–these would need to be checked and validated.

Is there some better and more intuitive method to highlight markdown that I haven’t thought of yet? is there a way to drastically speed up my original idea? Or is python and Tkinter simply not able to do what I’m trying to do fast enough.

Thanks in advance.


Solution

  • If you don't want to use an external library and keep the code simple, using re.finditer() seems faster than Text.search().

    You can use a single regular expression to match all cases:

    regexp = re.compile(r"((?P<delimiter>\*{1,3})[^*]+?(?P=delimiter)|(?P<delimiter2>\_{1,3})[^_]+?(?P=delimiter2))")
    

    The length of the "delimiter" group gives you the tag and the span of the match gives you where to apply the tag.

    Here is the code:

    import re
    import tkinter
    
    root = tkinter.Tk()
    root.title("Markdown Text Editor")
    editor = tkinter.Text(root)
    editor.pack()
    
    # bind each key Release to the markdown checker function
    editor.bind("<KeyRelease>", lambda event: check_markdown())
    
    # configure markdown styles
    editor.tag_config("bold", foreground="#FF0000") # red for debugging clarity
    editor.tag_config("italic", foreground="#00FF00") # green for debugging clarity
    editor.tag_config("bold-italic", foreground="#0000FF") # blue for debugging clarity
    
    regexp = re.compile(r"((?P<delimiter>\*{1,3})[^*]+?(?P=delimiter)|(?P<delimiter2>\_{1,3})[^_]+?(?P=delimiter2))")
    tags = {1: "italic", 2: "bold", 3: "bold-italic"}  # the length of the delimiter gives the tag
    
    
    def check_markdown(start_index="insert linestart", end_index="insert lineend"):
        text = editor.get(start_index, end_index)
        # remove all tag instances
        for tag in tags.values():
            editor.tag_remove(tag, start_index, end_index)
        # loop through each match and add the corresponding tag
        for match in regexp.finditer(text):
            groupdict = match.groupdict()
            delim = groupdict["delimiter"] # * delimiter
            if delim is None:
                delim = groupdict["delimiter2"]  # _ delimiter
            start, end = match.span()
            editor.tag_add(tags[len(delim)], f"{start_index}+{start}c", f"{start_index}+{end}c")
        return
    
    root.mainloop()
    

    Note that check_markdown() only works if start_index and end_index are on the same line, otherwise you need to split the text and do the search line by line.