How are the comments
and posts
tables related in the Reddit dataset available on BigQuery?
It doesn't seem obvious.
Below is for BigQuery Standard SQL
#standardSQL
SELECT posts.title, comments.body
FROM `fh-bigquery.reddit_comments.2016_01` AS comments
JOIN `fh-bigquery.reddit_posts.2016_01` AS posts
ON posts.id = SUBSTR(comments.link_id, 4)
WHERE posts.id = '43go1r'
If you still using BigQuery Legacy SQL consider migrating to BigQuery Standard SQL
.
Btw, performance wise it took 2 sec vs. 18 sec in Legacy SQL