I've been working on a common lisp program that parses through two XML files and combines them in a custom way to make a third XML. Unfortunately, lisp uses the CP1252 charset while my XML files use UTF-8 and contain some Japanese characters that can't be used in CP1252.
I've been trying to make clisp use the UTF-8 by adding
:external-format 'charset:UTF-8
To both the load (as was suggested here) and read-line
(because why not) functions, but clisp still throws up this error:
READ-LINE: Invalid byte #x81 in CHARSET:CP1252 conversion
Is there a way to do what I want with the code I have? I'm still fairly new to lisp.
Full Read Function Code:
(defun readXML (stream libSize)
(defparameter lib nil)
(defparameter x 1)
(loop
(defparameter lib (cons (read-line stream :external-format 'charset:UTF-8) lib))
(defparameter x (+ x 1))
(when (> x libSize) (return lib))))
read-line
This function does not accept the :external-format
argument.
It does accept several optional parameters, but they have nothing to do with encodings.
defparameter
This is a "top-level" operator, it creates a global dynamic variable.
Never use it inside a function.
Use let
there instead - it binds variables lexically.
loop
(see below) also binds the variables.
(defun read-lines (file-name max-lines)
"Open the file and read it line-by-line, at most `max-lines'."
(with-open-file (stream file-name :external-format charset:utf-8)
(loop :for line = (read-line stream nil nil)
:for line-number :upfrom 0
:while (and line (< line-number max-lines))
:collect line)))
Or, slightly simpler (as suggested by @jkiiski):
(defun read-lines (file-name max-lines)
"Open the file and read it line-by-line, at most `max-lines'."
(with-open-file (stream file-name :external-format charset:utf-8)
(loop :for line = (read-line stream nil nil)
:repeat max-lines
:while line
:collect line)))
with-open-file
opens the
file, binds stream
to the result and makes sure that the stream is
closed on exit.
loop
is a very advanced
iteration facility. It binds line
to each successive line, counts
them using line-number
, and collects lines into the return value:
PS. Please follow all links in the answer. They explain each operator in detail.