Search code examples
jsonscalaapache-spark

Get category of movie from json struct using spark scala


I have a df_movies and col of geners that look like json format.

|genres |
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 37, 'name': 'Western'}]

How can I extract the first field of 'name': val?

way #1

df_movies.withColumn
    ("genres_extract",regexp_extract(col("genres"),
    """ 'name': (\w+)""",1)).show(false)

way #2

df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
"""[{'id':\s\d,\s 'name':\s(\w+)""",1))

Excepted: Action


Solution

  • You can use get_json_object function:

      Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
        .toDF("genres")
        .withColumn("genres_extract", get_json_object(col("genres"), "$[0].name" ))
        .show()
    
    
    +--------------------+--------------+
    |              genres|genres_extract|
    +--------------------+--------------+
    |[{"id": 28, "name...|        Action|
    +--------------------+--------------+