I have a value serialized by PHP that I need to decode in Clojure. I'm using this library to deserialize it; it uses Instaparse which utilizes EBNF/ABNF notation to define the grammar. For reference, here's the full definition:
<S> = expr
<expr> = (string | integer | double | boolean | null | array)+
<digit> = #'[0-9]'
<number> = negative* (decimal-num | integer-num)
<negative> = '-'
<integer-num> = digit+
<decimal-num> = integer-num '.' integer-num
<zero-or-one> = '0'|'1'
size = digit+
key = (string | integer)
<val> = expr
array = <'a:'> <size> <':{'> (key val)+ <'}'> <';'>?
boolean = <'b:'> zero-or-one <';'>
null = <'N;'>
integer = <'i:'> number <';'>
double = <'d:'> number <';'>
string = <'s:'> <size> <':\\\"'> #'([^\"]|\\.)*' <'\\\";'>
I've found a bug in this library - it can't handle serialized strings that contain the "
character.
php > echo serialize('{"key":"value"}');
s:15:"{"key":"value"}";
Deserialized using the library, it blows up when it finds that second "
:
> (deserialize-php "s:15:\"{\"key\":\"value\"}\";")
[:index 7]
The problem exists on this line of the grammar definition:
string = <'s:'> <size> <':\\\"'> #'([^\"]|\\.)*' <'\\\";'>
You'll notice that the string definition excludes the "
character. That's not correct though, I could have any character in that string; the size is what matters. I'm not a BNF expert, so I'm trying to figure out what my options here are.
Is it possible to use the size as the correct number of characters to grab? If that's not possible, does someone see a way I can tweak the grammar definition to enable correct parsing?
As stated by Arthur Ulfeldt, this grammar is not context-free due to the bencoded strings. Nonetheless, it is a simple one to parse, just not with A/EBNF. For example, using Parse-EZ instead:
A convenience macro:
(defmacro tagged-sphp-expr [tag parser]
`(fn [] (between #(string ~(str tag ":")) #(~parser) #(string ";"))))
The rest:
(def sphp-integer (tagged-sphp-expr "i" integer))
(def sphp-decimal (tagged-sphp-expr "d" decimal))
(defn sphp-boolean []
(= \1 ((tagged-sphp-expr "b" #(chr-in "01")))))
(defn sphp-null [] (string "N;") :null)
(defn sphp-string []
(let [tag (string "s:")
size (integer)
open (no-trim #(string ":\""))
contents (read-n size)
close (string "\";")]
contents))
(declare sphp-array)
(defn sphp-expr []
(any #(sphp-integer) #(sphp-decimal) #(sphp-boolean) #(sphp-null) #(sphp-string) #(sphp-array)))
(defn sphp-key []
(any #(sphp-string) #(sphp-integer)))
(defn sphp-kv-pair []
(apply array-map (series #(sphp-key) #(sphp-expr))))
(defn sphp-array []
(let [size (between #(string "a:") #(integer) #(string ":{"))
contents (times size sphp-kv-pair)]
(chr \})
(attempt #(chr \;))
contents))
The test:
(def test-str "i:1;d:2;s:16:\"{\"key\": \"value\"}\";a:2:{s:3:\"php\";s:3:\"sux\";s:3:\"clj\";s:3:\"rox\";};b:1;")
(println test-str)
;=> i:1;d:2;s:16:"{"key": "value"}";a:2:{s:3:"php";s:3:"sux";s:3:"clj";s:3:"rox";};b:1;
(parse #(multi* sphp-expr) test-str)
;=> [1 2.0 "{\"key\": \"value\"}" [{"php" "sux"} {"clj" "rox"}] true]