Search code examples
scalaapache-sparkapache-spark-sql

Any workaround for JSONPATH wildcard not supported in Spark SQL


spark.sql("""select get_json_object('{"k":{"value":"abc"}}', '$.*.value') as j""").show()

This results in null while it should return 'abc'. It works if I replace * with k.

I'm aware of limited JSONPath support (https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF#LanguageManualUDF-get_json_object)

but is there a way to achieve this is Spark SQL.


Solution

  • There is a Spark JIRA "Any depth search not working in get_json_object ($..foo)" open for full JsonPath support.

    Until it is resolved, I'm afraid creating a UDF that uses a "general-purpose" JsonPath implementation might be the one and only option:

    > spark-shell --jars "json-path-2.7.0.jar"                      
    :
    scala> import com.jayway.jsonpath.JsonPath
    import com.jayway.jsonpath.JsonPath
    
    scala> val r = JsonPath.read[net.minidev.json.JSONArray]("""
         | {"k":{"value":"abc"},
         |  "m":{"value":"def"}}
         | ""","$.*.value")
    r: net.minidev.json.JSONArray = ["abc","def"]
    
    scala>