Search code examples
arraysjsongroovyquotespretty-print

Unnecessary quotes added to JSON file elements using JsonOutput in Groovy


My Groovy script is creating a JSON file that looks like this:

JSON output

There is a variable number of elements that go into the hsps array. Basically, my output is right but the script adds unnecessary quotes to the element. The relevant code looks like this:

    foundPlasmids.each {
    def tempHSPs = []
    it.hsps.each{
        def hsps = JsonOutput.toJson(
            [bit_score: it.bit_score, 
            evalue: it.evalue, 
            score: it.score, 
            query_from: it.query_from,
            query_to: it.query_to,
            hit_from: it.hit_from,
            hit_to: it.hit_to,
            align_len: it.align_len,
            gaps: it.gaps]
        )
        tempHSPs << JsonOutput.prettyPrint(hsps)
    }

    def output = JsonOutput.toJson(
        [contig: it.contig, title: it.title, accNumber: it.accession, length: it.length, noHSPs: it.noHsps, hsps: tempHSPs]
    )

    prettyOutput << JsonOutput.prettyPrint(output)
}

foundPlasmids is a hash containing all the information including the hsps arrays. I prettyPrint all the hsps arrays into tempHSPs and pass tempHSPs to output. I can't figure out why the extra quotes are added and can't think of a different way to pass the hsps arrays into output. Thank you for any help.


Solution

  • The objects that you're putting into the tempHSPs array are string representations of JSON, and they're being produced by the prettyPrint function. All of the toJson functions in JsonOutput return strings, and prettyPrint takes a string, formats it, and returns a string.

    What you're not putting into tempHSPs is an actual JSON object or array. You're putting in a string, and thus the final output contains, within each top level element, an array "hsps" containing a single string value.

    There are two problems with this.

    One is that the strings are not being escaped properly by the call to def output = JsonOutput.toJson, which I can only assume is a bug in the JsonOutput class? That seems unlikely, but I don't have a better explanation. It should look more like...

    [
        {
            "nohsps": 1,
            "hsps": [
                "{\r\n    \"bit_score\": 841.346,\r\n    \"evalue\": 0,\r\n    (and so on)\r\n}"
            ]
        },
        {
            "nohsps": 6,
            "hsps": [
                "{\r\n    \"bit_score\": 767.48,\r\n    \"evalue\": 0,\r\n    (and so on)\r\n}"
            ]
        }
    ]
    

    The second problem is that it sounds like you didn't want strings but rather wanted JSON objects, so just stop turning your Groovy objects into strings...

    def tempHSPs = []
    it.hsps.each{
        def hsps =
            [bit_score: it.bit_score, 
            evalue: it.evalue, 
            score: it.score, 
            query_from: it.query_from,
            query_to: it.query_to,
            hit_from: it.hit_from,
            hit_to: it.hit_to,
            align_len: it.align_len,
            gaps: it.gaps]
        )
        tempHSPs << hsps
    }
    

    or, if you want to simplify it, delete all that stuff about tempHSPs, let JsonObject serialize them automatically, and see what it comes up with automatically:

    def output = JsonOutput.toJson(
            [contig: it.contig, title: it.title, accNumber: it.accession, length: it.length, noHSPs: it.noHsps, hsps: foundPlasmids*.hsps ]
        )
    

    (I haven't validated this syntax; I'm just working off memory.)

    If it chokes on the hsps object or you don't like the resulting output (for example, if you want to remove or rename some properties) resume making the map like you're doing now.