Search code examples
sqlgoogle-bigqueryreddit

Joining Posts with Comments in the BigQuery Reddit dataset


How are the comments and posts tables related in the Reddit dataset available on BigQuery? It doesn't seem obvious.


Solution

  • Below is for BigQuery Standard SQL

    #standardSQL
    SELECT posts.title, comments.body
    FROM `fh-bigquery.reddit_comments.2016_01` AS comments
    JOIN `fh-bigquery.reddit_posts.2016_01`  AS posts
    ON posts.id = SUBSTR(comments.link_id, 4) 
    WHERE posts.id = '43go1r'
    

    If you still using BigQuery Legacy SQL consider migrating to BigQuery Standard SQL.

    Btw, performance wise it took 2 sec vs. 18 sec in Legacy SQL