Search code examples
common-lispkeywordread-eval-print-loopsbclrestriction

Restricting usage of keywords in Common Lisp (SBCL)


Using SBCL's implementation of Common Lisp, is it somehow possible to present a REPL where usage of specific keywords is prohibited (effectively providing access to only a subset of Common Lisp's functionality)?


Solution

  • (defpackage :limited-repl
      (:use :cl :named-readtables)
      (:export #:within-sandbox
               #:limited-repl))
    
    (in-package :limited-repl)
    

    In this answer I define a within-sandbox macro and a limited-repl function, such that:

    1. the current package is a fresh package that exists only during the execution of the sandbox, to avoid polluting a single package with a lot of different symbols entered by the user
    2. only an allowed list of symbols are visible to the user
    3. special characters like #\: and #\# are forbidden, to prevent the user from accessing symbols in other packages, and to avoid using syntax that might create arrays, bitvectors, etc. as well as reader variables (#1= and #1#), which could be used to create circular structures that loops infinitely when evaluated.
    4. the empty list is not equivalent to COMMON-LISP:NIL, in order to avoid introducing this symbol in the custom language; instead, it reads as zero.

    readtable

    I am using the named-readtables library, which is available in Quicklisp, in order to define a custom readtable. This can be achieved with Common Lisp too if necessary.

    (ql:quickload :named-readtables)
    

    Let's define a custom condition for forbidden-characters, this is a bit cleaner than using (error "some string"):

    (define-condition forbidden-character (error)
      ((character :initarg :character :reader forbidden-character.character)
       (stream :initarg :stream :reader forbidden-character.stream))
      (:report (lambda (condition stream)
                 (format stream
                         "Forbidden character ~@c"
                         (forbidden-character.character condition)))))
    

    Let's also define a reader function that signals an error for the character being read:

    (defun forbidden-character (stream character)
      (error 'forbidden-character
             :character character
             :stream stream))
    

    Here is the list reader that returns zero on an empty list:

    (defun read-zero-list (stream character)
      (assert (char= character #\())
      (let ((list (read-delimited-list #\) stream t)))
        (etypecase list
          (null 0)
          (cons list))))
    

    The limited-repl readtable is based on the standard readtable, except for two forbidden characters and a custom list reader:

    (defreadtable limited-repl
      (:merge :standard)
      (:macro-char #\( 'read-read-zero-list)
      (:macro-char #\: 'forbidden-character nil)
      (:macro-char #\# 'forbidden-character nil))
    

    language package

    The limited-language package defines the base language the user can access, here only basic arithmetic symbols and the quit symbol:

    (defpackage limited-language
      (:use)
      (:export #:+
               #:-
               #:*
               #:/
               #:quit))
    

    I also redefine the functions as wrapper around the CL functions, as follows:

    (macrolet ((lift (s c) `(defun ,s (&rest args) (apply #',c args))))
      (lift limited-language:- cl:-)
      (lift limited-language:+ cl:+)
      (lift limited-language:* cl:*)
      (lift limited-language:/ cl:/))
    

    This is necessary because just reexporting the CL symbols could have unintended effects, like for example the user accessing the variables * or / which are bound in Lisp, which we do not want here.

    temporary packages

    The following auxiliary function calls function in a context where *package* is bound to a fresh, temporary package that uses limited-language. I am using GENTEMP to generate a fresh symbol in a package-names package:

    (defpackage package-names
      (:use))
    
    (defun call-with-temporary-package (function)
      (let* ((symbol (gentemp "SANDBOX-" 'package-names))
             (package (make-package symbol :use '(limited-language))))
        (unwind-protect (let ((*package* package))
                          (funcall function))
          (delete-package package)
          (unintern symbol 'package-names))))
    

    When no REPL is running, package-names is empty, as the generated symbols are uninterned on unwinding. The temporary package is also deleted. There might be a more robust way to define temporary names for packages, this should be enough as a first draft.

    sandbox environment

    The within-sandbox macro establishes the context where the readtable and the package is set to the ones we want. I also bind *read-eval* to nil, just to be sure no code is evaluated while read is performed:

    (defmacro within-sandbox (&rest body)
      `(call-with-temporary-package
        (lambda ()
          (let ((*readtable* (find-readtable 'limited-repl))
                (*read-eval* nil))
            ,@body))))
    

    simple limited REPL

    Finally, the REPL is defined as follows, where quit is used to quit the REPL:

    (defun limited-repl ()
      (within-sandbox
       (loop
         (format t "~&> ")
         (finish-output)
         (clear-input)
         (handler-case (let ((form (read)))
                         (when (eq form 'limited-language:quit)
                           (return))
                         (eval form))
           (cl:end-of-file ()
             (return))
           (:no-error (v)
             (print v))
           (error (e)
             (format *error-output* "~&Error: ~a~%" e))))))
    

    example

    CL-USER> (limited-repl:limited-repl)
    
    > 5
    
    5 
    > (+ () (* 4 3) (/ 7 9))
    
    115/9 
    > #1=(list 0 . #1#)
    
    Error: Forbidden character #\#
    > *
    
    Error: The variable * is unbound.
    > (cl:in-package 'cl-user)
    
    Error: Forbidden character #\:
    > quit
    NIL
    CL-USER> 
    

    conclusion

    When limiting the symbols the user can access, you can rely on eval. However, this can still introduce CL specific semantics in your language, like when writing * alone reports an undefined variable * (maybe your language has no variable concept). You would then need to write your own evaluate function, which can delegate to eval, but which can also do other things outside of Common Lisp. There might be some other things I failed to consider for the sandboxed environment (execution times, etc.), be careful, but that should be already useful as-is.