Search code examples
pythontkinterunicodetclemoji

Tkinter and 32-bit Unicode duplicating – any fix?


I only want to show Chip, but I get both Chip AND Dale. It doesn't seem to matter which 32 bit character I put in, tkinter seems to duplicate them - it's not just chipmunks.

I'm thinking that I may have to render them to png and then place them as images, but that seems a bit ... heavy-handed.

Any other solutions? Is tkinter planning on fixing this?

import tkinter as tk

# Python 3.8.3
class Application(tk.Frame):
    def __init__(self, master=None):
        self.canvas = None
        self.quit_button = None
        tk.Frame.__init__(self, master)
        self.grid()
        self.create_widgets()

    def create_widgets(self):
        self.canvas = tk.Canvas(self, width=500, height=420, bg='yellow')
        self.canvas.create_text(250, 200, font="* 180", text='\U0001F43F')
        self.canvas.grid()

        self.quit_button = tk.Button(self, text='Quit', command=self.quit)
        self.quit_button.grid()

app = Application()
app.master.title('Emoji')
app.mainloop()

Chip and Dale on Mac OS

  • Apparently this works fine on Windows - so maybe it’s a MacOS issue.
  • I've run it on two separate Mac - both of them on the latest OS Catalina 10.15.5 - and both show the problem
  • The bug shows with the standard Python installer from python.org - Python 3.8.3 with Tcl/Tk 8.6.8
  • Supposedly it might be fixed with Tcl/Tk 8.6.10 - but I don't really see how I can upgrade Tcl/Tk using the normal installer.
  • This is also reported as a bug cf. https://bugs.python.org/issue41212

One of the python contributors believes that TCL/Tk can-not/will-not support variable width encoding (it always internally converts fixed width encoding) which indicates to me that Tcl/Tk is not suitable for general UTF-8 development.


Solution

  • The fundamental problem is that Tcl and Tk are not very happy with non-BMP (Unicode Basic Multilingual Plane) characters. Prior to 8.6.10, what happens is anyone's guess; the implementation simply assumed such characters didn't exist and was known to be buggy when they actually turned up (there's several tickets on various aspects of this). 8.7 will have stronger fixes in place (see TIP #389 for the details) — the basic aim is that if you feed non-BMP characters in, they can be got out at the other side so they can be written to a UTF-8 file or displayed by Tk if the font engine deigns to support them — but some operations will still be wrong as the string implementation will still be using surrogates. 9.0 will fix things properly (by changing the fundamental character storage unit to be large enough to accommodate any Unicode codepoint) but that's a disruptive change.

    With released versions, if you can get the surrogates over the wall from Python to Tcl, they'll probably end up in the GUI engine which might do the right thing. In some cases (not including any build I've currently got, FWIW, but I've got strange builds so don't read very much into that). With 8.7, sending over UTF-8 will be able to work; that's part of the functionality profile that will be guaranteed. (The encoding functions exist in older versions, but with 8.6 releases they will do the wrong thing with non-BMP UTF-8 and break weirdly with older versions than that.)