Search code examples
elm

Json.Decode: Extract list item and flatten/merge it into record


I want to parse some JSON (Elasticsearch search results with inner_hits) to a flat record, but some of the record's fields should come from an item in a list that in turn resides in a nested object:

{
    "_index": "test",
    "_type": "doc",
    "_id": "AUG.02.013.1320.02630.0",
    "_score": null,
    "_routing": "1",
    "_source": {
        "child_mgrpid": "1.1",
        "child_varia": "blabla",
        "type": "child",
        "my_join_field": {
            "name": "child",
            "parent": "AUG.02.013.1320"
        },
        "@version": "1",
        "@timestamp": "2020-01-12T16:45:11.302Z",
    },
    "inner_hits": {
        "prnt": {
            "hits": {
                "total": 1,
                "max_score": null,
                "hits": [
                    {
                        "_index": "test",
                        "_type": "doc",
                        "_id": "AUG.02.013.1320",
                        "_score": null,
                        "_routing": "1",
                        "_source": {
                            "pt_archiv": "",
                            "pt_id": "AUG.02.013.1320",
                            "pt_titel": "",
                            "pt_l_id": "AUG.02.013",
                            "pt_l_name": "Johann Christoph von Freyberg-Eisenberg"
                            "pt_t_id": "AUG",
                            "pt_t_kurzform": "AUG",
                            "type": "parent",
                            "my_join_field": {
                                "name": "parent"
                            },
                            "@version": "1",
                            "@timestamp": "2020-01-12T16:45:08.470Z",
                        },
                        "sort": [
                            "AUG"
                        ]
                    }
                ]
            }
        }
    }
}

(Actually, there are more data fields than in the example, so I can't use the Decoder.mapX functions.) I know for sure that there is always exactly one entry/parent in the inner_hits.prnt.hits.hits list and I want to take its _source fields and flatten them, together with fields from the "main" object, into a record the type definition of which looks like this:

type alias Child =
    { grp_id : String
    , varia : String
    , pt_archive : String
    , pt_id : String
    , pt_title : String
    , pt_l_id : String
    , pt_l_name : String
    , pt_t_id : String
    , pt_t_shortLabel : String
    }

Here is what decoders I have so far:

type alias Parent =
    { pt_archive : String
    , pt_id : String
    , pt_title : String
    , pt_l_id : String
    , pt_l_name : String
    , pt_t_id : String
    , pt_t_shortLabel : String
    }


childHitDecoder : Decoder Child
childHitDecoder =
    Json.Decode.succeed Child
        |> Json.Decode.Pipeline.requiredAt [ "_source", "child_mgrp_id" ] Json.Decode.string
        |> Json.Decode.Pipeline.requiredAt [ "_source", "child_varia" ] Json.Decode.string
        |> Json.Decode.Pipeline.custom (Json.Decode.at [ "inner_hits", "prnt", "hits", "hits" ] (firstElementDecoder parentInnerHitDecoder) )


firstElementDecoder : Json.Decode.Decoder a -> Json.Decode.Decoder a
firstElementDecoder baseDecoder =
    Json.Decode.list baseDecoder
        |> Json.Decode.map List.head
        |> Json.Decode.andThen (Maybe.map Json.Decode.succeed >> Maybe.withDefault (Json.Decode.fail "Empty list"))


parentInnerHitDecoder : Decoder Parent
parentInnerHitDecoder =
    Json.Decode.succeed Parent
        |> Json.Decode.Pipeline.requiredAt [ "_source", "archiv" ] Json.Decode.string
        |> Json.Decode.Pipeline.requiredAt [ "_source", "id" ] Json.Decode.string
        |> Json.Decode.Pipeline.requiredAt [ "_source", "titel" ] Json.Decode.string
        |> Json.Decode.Pipeline.requiredAt [ "_source", "lah_id" ] Json.Decode.string
        |> Json.Decode.Pipeline.requiredAt [ "_source", "lah_name" ] Json.Decode.string
        |> Json.Decode.Pipeline.requiredAt [ "_source", "ter_id" ] Json.Decode.string
        |> Json.Decode.Pipeline.requiredAt [ "_source", "ter_kurzform" ] Json.Decode.string

(The firstElementDecoder is from Elm: Decode a JSON array with a single element into a string, I wanted to use it to un-list the one inner_hits list item.)

With a setup like this, the last line of my childHitDecoder gives an error:

This function cannot handle the argument sent through the (|>) pipe:

482|     Json.Decode.succeed Child
483|         |> Json.Decode.Pipeline.requiredAt [ "_source", "child_mgrp_id" ] Json.Decode.string
484|         |> Json.Decode.Pipeline.requiredAt [ "_source", "child_varia" ] Json.Decode.string
485|         |> Json.Decode.Pipeline.custom (Json.Decode.at [ "inner_hits", "prnt", "hits", "hits" ] (firstElementDecoder parentInnerHitDecoder) )
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The argument is:

    Decoder
        (
        String
        -> String
        -> String
        -> String
        -> String
        -> String
        -> String
        -> Child
        )

But (|>) is piping it to a function that expects:

    Decoder (Parent -> b)

Hint: It looks like it takes too many arguments. I see 6 extra.

I have successfully used Decode.Pipeline.custom in order to decode nested JSON objects to a flat elm record in other places, but, apparently, to combine this with de-listifying here is beyond me.

This has me perplexed for some days now, I have tried it also with combining bundles of fields with Decode.map, but failed as well. I don't understand Decode.andThen or Decode.lazy well enough to see if they could help here, and I would be very grateful for any help.


Solution

  • Using the json object that you list above, the following should work:

    firstPtDecoder : String -> Json.Decode.Decoder String
    firstPtDecoder field =
        Json.Decode.map (Maybe.withDefault "" << List.head) <|
            Json.Decode.at [ "inner_hits", "prnt", "hits", "hits" ] <|
                Json.Decode.list (Json.Decode.at [ "_source", field ] Json.Decode.string)
    
    
    childHitDecoder : Json.Decode.Decoder Child
    childHitDecoder =
        Json.Decode.succeed Child
            |> Json.Decode.Pipeline.requiredAt [ "_source", "child_mgrpid" ] D.string
            |> Json.Decode.Pipeline.requiredAt [ "_source", "child_varia" ] D.string
            |> Json.Decode.Pipeline.custom (firstPtDecoder "pt_archiv")
            |> Json.Decode.Pipeline.custom (firstPtDecoder "pt_id")
            |> Json.Decode.Pipeline.custom (firstPtDecoder "pt_titel")
            |> Json.Decode.Pipeline.custom (firstPtDecoder "pt_l_id")
            |> Json.Decode.Pipeline.custom (firstPtDecoder "pt_l_name")
            |> Json.Decode.Pipeline.custom (firstPtDecoder "pt_t_id")
            |> Json.Decode.Pipeline.custom (firstPtDecoder "pt_t_kurzform")
    

    Most of the work is done by firstPtDecoder which searches for field in the "hits" array, and returns the string found in the first element of the array.