Search code examples
jsonforeachopenrefinegrel

Parse multivalued JSON in GREL (OpenRefine)


I have a column with the following content:

7. {"resource":"abc"}
8. [{"resource":"def"},{"resource":"ghi"}]

I try to get the content of "resource":

value.parseJson().resource

Works. If I try to get the content of multivalued cells, I can't get it to work. I tried according to https://github.com/OpenRefine/OpenRefine/wiki/GREL-Other-Functions:

forEach(value.parseJson().resource,v,v.resource)

I get a lot of errors, for example:

7.  {"resource":"abc"}  Error: First argument to forEach is not an array
8.  [{"resource":"def"},{"resource":"ghi"}] Error: Object does not have any field, including resource

And I can only get the multivalued cells, but not the single value cells with this one:

forEach(value.parseJson(),v,v.resource)

Solution

  • A problem you have here is the difference between the cells that contain JSON arrays (i.e. multiple values) and those that don't.

    There are different ways of approaching this issue, and the best approach may depend on how consistent the data is overall.

    My first suggestion would be to use a filter or facet to work with the single value and array cells separately. Given the data you use in your example I think the following would work:

    1) Create a Custom Text Facet on the column using the GREL

    value.startsWith("[")
    

    2) Select 'false' from the facet to work with the single value cells 3) For these cells use the GREL

    value.parseJson().resource
    

    4) Select 'true' from the facet to work with the array cells 5) For these cells use the GREL

    forEach(value.parseJson(),v,v.resource)
    

    (n.b. this is slightly different to the GREL you mention in your question as giving errors)

    6) The output of this GREL is an OpenRefine array. You'll need to convert this to a string to store the output in a cell - so you may need to use something like:

    forEach(value.parseJson(),v,v.resource).join("|")