Search code examples
xmlwindowsxml-parsingcommon-lispclisp

Charsets in Common Lisp


I've been working on a common lisp program that parses through two XML files and combines them in a custom way to make a third XML. Unfortunately, lisp uses the CP1252 charset while my XML files use UTF-8 and contain some Japanese characters that can't be used in CP1252.

I've been trying to make clisp use the UTF-8 by adding

:external-format 'charset:UTF-8

To both the load (as was suggested here) and read-line (because why not) functions, but clisp still throws up this error:

READ-LINE: Invalid byte #x81 in CHARSET:CP1252 conversion

Is there a way to do what I want with the code I have? I'm still fairly new to lisp.

Full Read Function Code:

(defun readXML (stream libSize)
    (defparameter lib nil)
    (defparameter x 1)
    (loop
        (defparameter lib (cons (read-line stream :external-format 'charset:UTF-8) lib))
        (defparameter x (+ x 1))
        (when (> x libSize) (return lib))))

Solution

  • Mistakes

    read-line

    This function does not accept the :external-format argument.

    It does accept several optional parameters, but they have nothing to do with encodings.

    defparameter

    This is a "top-level" operator, it creates a global dynamic variable. Never use it inside a function. Use let there instead - it binds variables lexically. loop (see below) also binds the variables.

    Correct code

    (defun read-lines (file-name max-lines)
      "Open the file and read it line-by-line, at most `max-lines'."
      (with-open-file (stream file-name :external-format charset:utf-8)
        (loop :for line = (read-line stream nil nil)
          :for line-number :upfrom 0
          :while (and line (< line-number max-lines))
          :collect line)))
    

    Or, slightly simpler (as suggested by @jkiiski):

    (defun read-lines (file-name max-lines)
      "Open the file and read it line-by-line, at most `max-lines'."
      (with-open-file (stream file-name :external-format charset:utf-8)
        (loop :for line = (read-line stream nil nil)
          :repeat max-lines
          :while line
          :collect line)))
    

    Explanations

    • with-open-file opens the file, binds stream to the result and makes sure that the stream is closed on exit.

    • loop is a very advanced iteration facility. It binds line to each successive line, counts them using line-number, and collects lines into the return value:

    PS. Please follow all links in the answer. They explain each operator in detail.