I have a df_movies and col of geners that look like json format.
|genres |
[{'id': 28, 'name': 'Action'}, {'id': 12, 'name': 'Adventure'}, {'id': 37, 'name': 'Western'}]
How can I extract the first field of 'name': val?
way #1
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
""" 'name': (\w+)""",1)).show(false)
way #2
df_movies.withColumn
("genres_extract",regexp_extract(col("genres"),
"""[{'id':\s\d,\s 'name':\s(\w+)""",1))
Excepted: Action
You can use get_json_object function:
Seq("""[{"id": 28, "name": "Action"}, {"id": 12, "name": "Adventure"}, {"id": 37, "name": "Western"}]""")
.toDF("genres")
.withColumn("genres_extract", get_json_object(col("genres"), "$[0].name" ))
.show()
+--------------------+--------------+
| genres|genres_extract|
+--------------------+--------------+
|[{"id": 28, "name...| Action|
+--------------------+--------------+