Search code examples
jsonparsingreboldialect

Should the PARSE dialect be used on tasks that are fundamentally about modifying the input?


In honor of Rebol 3 going open source any-minute-now (?), I'm back to messing with it. As an exercise I'm trying to write my own JSON parser in the PARSE dialect.

Since Douglas Crockford credits influence of Rebol on his discovery of JSON, I thought it would be easy. Outside of replacing braces with brackets and getting rid of all those commas, one of the barriers to merely using LOAD on the string is the fact that when they want to do the equivalent of a SET-WORD! they use something that looks like a string to Rebol's tokenizer, with an illegal stray colon after it:

{
    "key one": {
         "summary": "This is the string content for key one's summary",
         "value": 7
    },
    "key two": {
         "summary": "Another actually string, not supposed to be a 'symbol'",
         "value": 100
    }
}

Basically I wanted to find all the cases that were like "foo bar": and turn them into foo-bar: while leaving matching quote pairs that were not followed by colons alone.

When I tackled this in PARSE (which I understand rather well in principle but still haven't used much) a couple of questions came up. But mainly, what are the promised conditions under which when you can escape into code and modify the series out from under the parser...specifically in Rebol 3? More generally, is it the "right kind of tool for the job"?

Here was the rule I tried, that appears to work for this part of the task:

any [
    ; require a matched pair of quotes & capture series positions before
    ; and after the first quote, and before the last quote

    to {"} beforePos: skip startPos: to {"} endPos: skip

    ; optional colon next (if not there the rest of the next rule is skipped)

    opt [
        {:}

        ; if we got to this part of the optional match rule, there was a colon.
        ; we escape to code changing spaces to dashes in the range we captured

        (
            setWordString: copy/part startPos endPos
            replace/all setWordString space "-"
            change startPos setWordString
        )

        ; break back out into the parse dialect, and instead of changing the 
        ; series length out from under the parser we jump it back to the position
        ; before that first quote that we saw

        :beforePos

        ; Now do the removals through a match rule.  We know they are there and
        ; this will not cause this "colon-case" match rule to fail...because we
        ; saw those two quotes on the first time through!

        remove [{"}] to {"} remove [{"}]
    ]
]

Is that okay? Is there any chance of the change startPos setWordString in the open code mucking up the outer parse...if not in this case, then in something subtly different?

As always, any didactic "it's cleaner/shorter/better this other way" advice is appreciated.

P.S. why isn't there a replace/all/part?


Solution

  • The new keywords like change, insert and remove should facilitate this type of thing. I guess the main downside to this approach is the latency issues in pushing series around (I've seen mention that it is faster to build new strings than to manipulate).

    token: [
        and [{"} thru {"} any " " ":"]
        remove {"} copy key to {"} remove {"} remove any " "
        (key: replace/all key " " "-")
    ]
    
    parse/all json [
        any [
            to {"} [
                and change token key
                ; next rule here, example:
                copy new-key thru ":" (probe new-key)
                | skip
            ]
        ]
    ]
    

    This is a bit convoluted as I can't seem to get 'change to work as I'd expect (behaves like change, not change/part), but in theory you should be able to make it shorter along these lines and have a fairly clean rule. Ideal might be:

    token: [
        {"} copy key to {"} skip any " " and ":"
        (key: replace/all key " " "-")
    ]
    
    parse/all json [
        any [
            to {"} change token key
            | thru {"}
        ]
    ]
    

    Edit: Another fudge around change -

    token: [
        and [{"} key: to {"} key.: skip any " " ":"]
        (key: replace/all copy/part key key. " " "-")
        remove to ":" insert key
    ]
    
    parse/all json [
        any [to {"} [token | skip]]
    ]