Search code examples
lispcommon-lisps-expression

How can I unserialize a s-exp then seralize it and avoid upper-casing?


I am supposed to read a complex s-expression tree, modify some nodes and save it somewhere.

It appears that in process the 'read function is "modifying" the input. For instance in a simple example:

CL-USER> (read-from-string "(seq_num 123)")
(SEQ_NUM 123)
13

You can see it capitalize nodes (as well as values).

As well it appears it can add pipes to the outmost left and right of a value. As in:

CL-USER> (read-from-string "(password 000006013H)")
(PASSWORD |000006013H|)
21

It adds pipes!

Is there a way to tell 'read not to do that? I guess such modification are done for good reason when the s-expression is actually a valid LISP program. But here it is not the case. Let see that file as an XML file. A simple configuration file which appears to be a s-expr. I don't need him to "intern" the symbols it reads. I just need him to unserialise it as a tree, since for me it is the easiest way to search the tree then ( 'car and 'cdr are no nice).

Well if the tree is formed then every symbols must be interned.. Told in another words, how can I tell him to intern no symbols, but instead keep it as strings. (thus it could form the cons-tree but instead of pointing to symbols, it would point to characters strings. You see what I mean?)


Solution

  • The reader will by default internalize symbols. Note that both the reader and printer affect how symbols are used and appear. If you want to see the real case of a symbol call (symbol-name some-symbol). The printer will try to escape a symbol if necessary, such that it can be read back and the same case gets used.

    CL-USER 26 > 'foo
    FOO
    
    CL-USER 27 > 'f\oo
    |FoO|
    
    CL-USER 28 > (symbol-name 'f\oo)
    "FoO"
    

    The reader allows the control of how a symbol gets read. See below.

    A few things to know:

    • all symbols are by default uppercase internally. By default the reader uppercases lowercase characters.

    • a symbol can contain arbitrary characters, including lowercase characters. The symbol then needs to use escapes:

    Example:

    |This is a valid symbol.|
    

    Not that the vertical bars are not part of the symbol name. They are used to escape the symbol. Another escape character is the backslash:

    1\a2
    

    Above is also a symbol.

    • note that things that contain numbers or characters can also be symbols or numbers, depending on the read base:

    Example:

    00a
    

    Above is a symbol in the reader base 10.

    Same Example, other read base:

    00a
    

    Above is a number in the reader base 16.

    • a non-interned symbol (not in a package) is written like this:

    Example:

    #:non-interned-symbol
    
    • a keyword symbol:

    Example:

    :keyword-symbol
    

    How can you influence in which case a symbol is created/looked-up during read?

    • you can escape the symbol, see above

    • Use a readtable with a different case.

    Example in the Common Lisp Hyperspec: 23.1.2.1 Examples of Effect of Readtable Case on the Lisp Reader

    Turn off escaping in the printer

    CL-USER 36 > (let ((*print-escape* nil))
                   (write (read-from-string "(|passWord| 000006013H)")))
    (passWord 000006013H)
    ...