Search code examples
pythonhtmltkintertkinter-text

How can I use text with html tags in a tkinter text box, or change it so that it works in a tkinter label?


I've been given a lot of text and asked to display it in a tkinter app. The text has a lot of html tags like <em>...<\em>, and <sup>...<\sup> where the text needs to be italicized or superscript.

Is there any way built into tkinter to do this? If not, is it even possible to write a function to, for example, italicize all text between <em> tags, then delete the tags?

I know I would be able to remove the tags by doing something like:

for tag in ["<em>", "<\em>", "<sup>", "<\sup>"]:
    text = "".join(text.split(tag))

But I really need to, at least, italicize the text between <em> tags before removing them.

I'm new to tkinter, and I've been watching a lot of tutorials and googling for solutions, but it seems like tkinter can't naturally use html tags, and I can't find any solution.

EDIT:

I need to display this in a regular tkinter text widget.

I know I can use tkinter's font method with slant=italic to set text in a text box to italic. I just need to know a way to set the parameters to everything between <em> tags.


Solution

  • So, I worked this out myself over the last few days. First you have find the places in the text that you want to italicize, removing the html tags from the text as you go along, next you have to put the tag-free text into a text widget, then you have to identify the points in the widget's text to italicize.

    It's a bit finicky because identifying points in the text-widget's text requires a decimal input where the number before the decimal point represents the line number, and the number after the decimal represents the index of the character in that line. This means you need to identify line numbers for each index, so you need a way of knowing exactly where one line ends and another begins. Also, line 2, character 4 is 2.4, and line 2, character 40 is 2.40 so Float(f"{line_number}.{character_number}") won't work as it will remove any trailing zeros, you have to use Decimal(f"{line_number}.{character_number}").

    For example, in the text alphabet = 'abcd efg hijk\nlmnop qrs tuv wx yz', if you want to italicize all of the letters from "h" to "p" you first have to get an index for "h" to start italicizing at, start = alpha.find("h"), then after p to stop italicizing at, end = alphabet.find("p") + 1. Next you have to find which line the start point and end point are on and translate the indices (9 and 19 respectively) to decimal format (1.9 and 2.5):

    start_line = alphabet[:start].count("\n") + 1
    end_line = alphabet[:end].count("\n") + 1
    line_start_point = len(alphabet[alphabet[:start].rfind("\n") + 1: start])
    line_end_point = len(alphabet[alphabet[:end].rfind("\n") + 1: end])
    start_point = Decimal(f"{start_line}.{line_start_point}")
    end_point = Decimal(f"{end_line}.{line_end_point}")
    

    Anyway, here's all of the code I ended up using to remove the unnecessary <sup>...</sup> tags and anything between them, and to italicize the everything between <em>...</em> tags:

    from decimal import Decimal
    from tkinter import *
    from tkinter import font
    
    def em_points(text):
        suppat = re.compile(r'<sup>\w*</sup>')
        suppatiter = suppat.findall(text)
        if suppatiter:
            for suptag in suppatiter:
                text = "".join(text.split(suptag))
        finds = list()
        if "<em>" in text:
            find_points = list()
            emcount = text.count("<em>")
            for _ in range(emcount):
                find_open = text.find("<em>")
                text = text[:find_open] + text[find_open + 4:]
                find_close = text.find("</em>")
                text = text[:find_close] + text[find_close + 5:]
                find_points.append([find_open, find_close])
            for points in find_points:
                finds.append(text[points[0]: points[1]])
        return [text, finds]
    
    def italicize_text(text_box, finds):
        italics_font = font.Font(text_box, text_box.cget("font"))
        italics_font.configure(slant="italic")
        text_box.tag_configure("italics", font=italics_font)
        text_in_box = text_box.get(1.0, END)
        used_points = list()
        for find in finds:
            if find not in text_in_box:
                raise RuntimeError(f"Could not find text to italicise in textbox:\n    {find}\n    {text_in_box}")
            else:
                start_point = text_in_box.find(find)
                end_point = start_point + len(find)
                found_at = [start_point, end_point]
                if found_at in used_points:
                    while found_at in used_points:
                        reduced_text = text_in_box[end_point:]
                        start_point = end_point + reduced_text.find(find)
                        end_point = start_point + len(find)
                        found_at = [start_point, end_point]
                used_points.append(found_at)
                text_to_startpoint = text_in_box[:start_point]
                text_to_endpoint = text_in_box[:end_point]
                start_line = text_to_startpoint.count("\n") + 1
                end_line = text_to_endpoint.count("\n") + 1
                if "\n" in text_to_startpoint:
                    line_start_point = len(text_in_box[text_to_startpoint.rfind("\n") + 1: start_point])
                else:
                    line_start_point = start_point
                if "\n" in text_to_endpoint:
                    line_end_point = len(text_in_box[text_to_endpoint.rfind("\n") + 1: end_point])
                else:
                    line_end_point = end_point
                start_point = Decimal(f"{start_line}.{line_start_point}")
                end_point = Decimal(f"{end_line}.{line_end_point}")
                text_box.tag_add("italics", start_point, end_point)
    
    em_text = em_points(text)
    clean_text = em_text[0]
    em_list = em_text[1]
    
    text_box = Text(root, width=80, height=5, font=("Courier", 12))
    text_box.insert(1.0, clean_text)
    italicize_text(text_box, em_list)