i'm using Flink SQL and the following scheme shows my source data (belonging to some Twitter data):
CREATE TABLE `twitter_raw` (
`entities` ROW(
`hashtags` ROW(
`text` STRING,
`indices` INT ARRAY
) ARRAY,
`urls` ROW(
`indices` INT ARRAY,
`url` STRING,
`display_url` STRING,
`expanded_url` STRING
) ARRAY,
`user_mentions` ROW(
`screen_name` STRING,
`name` STRING,
`id` BIGINT
) ARRAY
)
)
WITH (...);
I want to get only the hashtags in a collection. Therefore i have to map the collection of constructed objects (ROW) to an array of STRING.
Like this scheme:
CREATE TABLE `twitter_raw` (
`entities` ROW(
`hashtags` STRING ARRAY,
`urls` STRING ARRAY,
`user_mentions` STRING ARRAY
)
)
WITH (...);
How can i achieve this with Flink-SQL? Maybe built-in functions (JSON-functions?) or own UDF or do i have to write a DataStream Job?
Thanks in advance.
The SQL command UNNEST
helps in this case. It is like EXPLODE
in Spark.
You can solve it by creating a new row for each hashtag in the hashtags
array:
SELECT hashtag, index
FROM twitter_raw
CROSS JOIN UNNEST(hashtags) AS t (hashtag, index)