Search code examples
jsonlispcommon-lispclispgnu-common-lisp

Get the right subsequence for the recursive call parsing a JSON string


I started writing the code below for a homework from university which consist on parsing a json-string in Common Lisp. The main issue that I'm facing right now is to get the right substring\subsequence to continue with the recursive call and parse the rest of the string. Basically the main idea is to recursively check the whole string. Following the given syntax the input string should be:

    1) "{\"nome\" : \"arturo\" , \"cognome\" : \"durone\" , \"età\" : \"22\" , \"corso\" : \"informatica\"}"
    2) "{\"name\" : \"Zaphod\",\"heads\" : [[\"Head1\"], [\"Head2\"]]}"
    3) "{\"name\" : \"Zaphod\",{property:value, property : [1,2,3] }
    4) "[1,2,3]"

Basically I do remove any space and any \" from the string getting a clean string "{name:Zaphod,heads:[[Head1],[Head2]]}" , here I check the position of ':' and get the subsequence from 0 to ((position ":")-1), and the same for the second part, but the problem comes when I have to pass to the recursive call, since I don't know how to pass the right index of the string.

I tried to check the length of any element of the new list that the function gives me in output, but it doesn't work/help since the string is split and there not the space and \" chars from the initial input. Could you please help me to find out a way to parse the rest of the json-string following a recursive approach?

> main function 
(defun j-obj (str) 
 (cond ((correct_form str)
        `(json-obj-aux(revome_braces (remove_backslash(remove_space str)))))))`

> aux function that thru a recursive call analize the whole string 
(defun json-obj-aux (str)
 (cond ((= (length str) 0)nil)
       ((cons (aux_control str)nil))))
            ;   (json-obj-aux (subseq (shorter str)(length (aux_control (shorter str)))
                                ;                          (length (shorter str))))))))
> check the whole string , splitting once it finds ":"
(defun aux_control (str) 
   (cons (subseq str 0 (search ":" str))(check_type (subseq str (+ (search ":" str) 1) (length str)))))

(defun check_type (str)
  (cond ((equal (subseq str 0 1) "{")(obj_c str))
        ((equal (subseq str 0 1) "[")(cons (obj_array (remove_braces str))nil))
        (t (cons (subseq str 0 (search "," str))nil))))


(defun obj_c (str)
 "{")

(defun obj_array (str)
  (cond ((= 0 (length str))nil)
        ((null (search "," str))(cons (subseq str (+ (search "[" str)1)(- (length str)1))nil))
        ((and (null (search "[" str))(null (search "," str)))(cons str nil))
        ((null (search "[" str))(cons (subseq str 0 (search "," str))
                                      (obj_array (subseq str (+ (search "," str) 1)))))
        ((cons (subseq str (+ (search "[" str) 1)(search "]" str))
              (obj_array (subseq str (+ (search "," str) 1)))))))

(defun remove_space (str)
 (cond ((= 0 (length str))nil)
       ((concatenate 'string (remove_aux str) (remove_space(subseq str 1))))))

(defun remove_aux (str)
 (cond ((equal (subseq str 0 1) " ")"")
       ((concatenate 'string (subseq str 0 1) ""))))


(defun remove_backslash (str)
  (cond ((= 0 (length str))nil)
        ((concatenate 'string (remove_slash str)(remove_backslash(subseq str 1))))))

(defun remove_slash (str)
  (cond ((equal (subseq str 0 1) "\"")"")
        ((concatenate 'string (subseq str 0 1) ""))))

(defun remove_braces (str)
  (subseq str 1 (- (length str) 1)))


(defun shorter (str)
  (subseq str 1 (length str)))

This is what I get until now which is not completely wrong since I can parse parts of the json-string. What I can't really parse is the whole one cause I don't know how to pass the right index of the new subsequence:

    CL-USER 1 >  (j-obj "{\"name\" : \"Zaphod\",\"heads\" : [[\"Head1\"], [\"Head2\"]]}")
    (("name" "Zaphod"))


    CL-USER 2 > (j-obj "{\"heads\" : [[\"Head1\"], [\"Head2\"]]}")
    (("heads" ("Head1" "Head2")))

The right output should be:

(("name" "Zaphod")("heads" ("Head1" "Head2")))

Solution

  • You should not remove characters from your input that help determining what is coming next. {name:Zaphod,heads:[[Head1],[Head2]]} is not clean, it is invalid JSON. All keys in JSON must be strings, all strings enclosed in "". Head1 is not a valid thing in JSON.

    One way to do this cleanly is to first tokenize the string:

    "{\"name\" : \"Zaphod\",\"heads\" : [[\"Head1\"], [\"Head2\"]]}"
    

    yields

    {
    "name"
    :
    "Zaphod"
    ,
    "heads"
    :
    [
    [
    "Head1"
    ]
    ,
    [
    "Head2"
    ]
    ]
    }
    

    The parse-json function then takes a look at the first token: if it is a string, it yields a string; if it is a number, it yields a number; if it is a boolean, it yields that boolean; … if it is a {, it calls parse-json-obj; if it is a [, it calls parse-json-array.

    Parse-json-obj repeatedly calls parse-key-value until the next token is a }, not a ,.

    Parse-key-value parses a string (error otherwise), then a :, then calls parse for the value.

    You can keep track of where you are in the token list by returning the rest as a second value from each parse* function.