Search code examples
unicodelispcommon-lisphashtable

Common Lisp hash-table with with accented characters as keys


I'm trying to create a hashtable in Common Lisp to store characters as keys, but the hashtable doesn't work if I use accented characters. It only takes one possible key with accents.

In this example I add 5 keys, and see that the hashtable shows 5 elements, then add another 5 with accents, and the table shows 6 elements, then add another “normal” 5 elements and the size goes to 11 (as expected).

What is happening? And how can I solve this?

(defparameter *h* (make-hash-table))
(setf (gethash #\A *h*) #\A)
(setf (gethash #\E *h*) #\A)
(setf (gethash #\I *h*) #\A)
(setf (gethash #\O *h*) #\A)
(setf (gethash #\U *h*) #\A)
(hash-table-count *h*)
(setf (gethash #\á *h*) #\A)
(setf (gethash #\é *h*) #\A)
(setf (gethash #\í *h*) #\A)
(setf (gethash #\ó *h*) #\A)
(setf (gethash #\ú *h*) #\A)
(hash-table-count *h*)
(setf (gethash #\a *h*) #\A)
(setf (gethash #\e *h*) #\A)
(setf (gethash #\i *h*) #\A)
(setf (gethash #\o *h*) #\A)
(setf (gethash #\u *h*) #\A)
(hash-table-count *h*)

Solution

  • From the SBCL manual:

    On non-Unicode builds, the default external format is :latin-1.
    

    You want to use UTF-8. So do what the manual says, and set your environment up when you call SBCL:

    $ LANG=C.UTF-8 sbcl --noinform --no-userinit --eval "(print (map 'string #'code-char (list 97 98 246)))" --quit
    "abö"
    $ LANG=C sbcl --noinform --no-userinit --eval "(print (map 'string #'code-char (list 97 98 246)))" --quit
    "ab?"
    

    If you use SLIME or Sly from Emacs, there is a way to set it up in your init:

    (setq sly-lisp-implementations
          '((sbcl ("/opt/sbcl/bin/sbcl") :coding-system utf-8-unix)))
    

    Then use a sane test function, like char=. You should use the most specific predicate whenever possible, in my opinion. char-equal is the case-insensitive version.

    Sly manual, though the above snippet works on SLIME too as slime-lisp-implemetations

    As noted in the comment by @Manuel if your LANG variable and friends do not use UTF-8, then you are doomed. See this quetsion