I cannot find the bug related to the global hash-table in the following SBCL code:
(defparameter *lci-hash-table* (make-hash-table :size 10000))
"Hash-table key = length|char|index concatenated together (eg, 5A0),
hash-table values = list of matching symbols (eg, (ABATE ABBEY ...))."
(defun dictionary-compatible ($new-cross-str) ;eg, "A " of length 5
"Tests if a string (with uppercase alphabetic and space characters)
is compatible with the dictionary words (symbols) in a hash-table."
(iter (with len = (length $new-cross-str))
(for char in-sequence $new-cross-str)
(for index from 0)
(when (not (eql char #\Space))
(collect (gethash (ut::intern-symbol len char index) ;this gethash
*lci-hash-table*) into dict-words)) ;always returns nil
(finally (return (reduce #'intersection dict-words)))))
(defun intern-symbol (&rest args)
"Interns a symbol created by concatenating args.
Based on symb in Let Over Lambda."
(flet ((mkstr (&rest args)
(with-output-to-string (s)
(dolist (a args) (princ a s)))))
(values (intern (apply #'mkstr args)))))
The function dictionary-compatible
above runs fine in single-threaded mode in SBCL. But starting over with a new image in multi-threaded mode (using the :lparallel library) the gethash access to the global *lci-hash-table*
always returns NIL.
The *lci-hash-table*
is only being accessed, never updated, and therefore, according to the SBCL manual, there is no need to make it :synchronized, or defined with sb-ext:defglobal.
I think I'm missing something that's related to multi-processing for hash-tables in SBCL.
Edit: For background info, here is the code to populate the hash-table:
(defun create-lci-ht (dictionary-file)
"Read dictionary words (symbols) from a file into strings
and store in a lci hash table."
(with-open-file (infile dictionary-file :direction :input :if-does-not-exist nil)
(when (not (streamp infile)) (error "File does not exist!"))
(with-open-file (stream dictionary-file)
(let ((word-strings (uiop:read-file-lines stream)))
(iter (for word-string in word-strings)
(for word-length = (length word-string))
(while word-string)
(iter (for char in-sequence word-string)
(for index from 0)
(push (intern word-string)
(gethash (ut::intern-symbol word-length char index)
*lci-hash-table*))))))))
This code exhibits two classic mistakes in CL: mistakes which everyone, including me, has made at some point!
The first mistake is that you are explicitly creating and looking up symbols in a package without ever specifying the package, so you are simply relying on whatever the ambient value of *package*
is when the code is running. That's, at the very least, fragile.
(in-package ...)
in a source file sets the package during the time the code in the file is being compiled and loaded but not the ambient package outside the dynamic extent of the compilation / loading, since both load
and compile-file
rebind *package*
to its ambient value.with-standard-io-syntax
rebind *package*
.*package*
, and special variables in general, with multithreading is implementation-dependent and shims like Bordeaux Threads do not (and probably can not!) hide this.As an example of the last point, consider this code:
;;; Assume Bordeaux threads is loaded
;;;
(defpackage :one
(:use :cl :bordeaux-threads))
(defpackage :two
(:use :cl :bordeaux-threads))
(in-package :one)
(start-multiprocessing)
(defun test (p1 p2)
(setf p1 (find-package p1)
p2 (find-package p2))
(let ((*package* p1))
(multiple-value-bind (tp1 tp2)
(join-thread
(make-thread
(lambda ()
(values *package*
(setf *package* p2)))))
(values (package-name tp1) (package-name tp2)
(package-name *package*)))))
Now in SBCL:
> (test :one :two)
"COMMON-LISP-USER"
"TWO"
"ONE"
> (test :one :two)
"TWO"
"TWO"
"ONE"
Yes: different results!
In LW:
> (test :one :two)
"ONE"
"TWO"
"ONE"
> (test :one :two)
"ONE"
"TWO"
"ONE"
I don't understand SBCL's behaviour but I also have not read a lot of SBCL's manual in detail. I think what is probably happening is that in SBCL threads do not inherit special bindings, and *package*
is bound somewhere in the REPL thread: this means that the assignment in the thread is to the global value of *package*
. In LW threads do inherit their parent's bindings I think.
The way of resolving this is to always be specific about what package you are using if you are calling functions like intern
. This is what is actually breaking your code.
The second mistake is that you are using packages to map strings to symbols and then using a hashtable to map symbols to values. In other words:
OK, think about that: you're doing two lookups to get the value you're after. Instead, do one:
equal
hashtable directly.The first approach is essentially just using a package as a hashtable: this is fine so long as you are careful that this is all you use it for (so make sure you don't use it for the symbols which make up your program as well!), but it's ugly for two reasons:
So generally, if what you want is a map from strings to values, where those strings are not the names of parts of your program, the best approach is just to use an equal
hashtable for this.
If you really do need to use a package for some reason, and you want intern-symbol
to take an unknown number of arguments, then probably you want intern-symbol
either to take the package as its first argument:
(defun intern-symbol (package &rest args)
(intern (format nil "~{~A~}" args) package))
Or you might provide an explicit variable
(defvar *my-hashtable-package* (make-package "MY-HASHTABLE-PACKAGE" :use '()))
(defun intern-symbol (&rest args)
(intern (format nil "~{~A~}" args) *my-hashtable-package*))
As a final note: intern-symbol
is not reliable in general:
> (eq (intern-symbol 1 23) (intern-symbol 1 2 3))
t
> (eq (intern-symbol 1 23) (intern-symbol 1 "2" 3))
t
Of course it may be fine in constrained cases.