Need help querying top reddit comments by top posts

I'm trying to collect the (let's say top 20) most upvoted comments within the most upvoted posts of a given subreddit.

Any help would be appreciated!

I've gotten to this code which I use in bigquery, but I can't seem to get the post score and the comment score without a duplicate issue.

SELECT posts.title, posts.score, comments.body, posts.subreddit
FROM `fh-bigquery.reddit_comments.2018_10` AS comments
JOIN `fh-bigquery.reddit_posts.2018_10`  AS posts
ON posts.id = SUBSTR(comments.link_id, 4) 
WHERE posts.subreddit = 'Showerthoughts'

For a simplified example I would like to be able to see:

Post Title 1 | Post Score | (Within Post Title 1) Comment Body 1 | Comment Score

Post Title 1 | Post Score | (Within Post Title 1) Comment Body 2 | Comment Score

Post Title 2 | Post Score | (Within Post Title 2) Comment Body 1 | Comment Score

Post Title 2 | Post Score | (Within Post Title 2) Comment Body 2 | Comment Score

Solution

Here is a quick way of getting around the issue of duplicate text blobs:

select title, score, body, subreddit from (
    SELECT 
        to_hex(md5(posts.title)), 
        array_agg(posts.title)[offset(0)] as title, 
        array_agg(comments.body)[offset(0)] as body, 
        array_agg(posts.score)[offset(0)] as score, 
        array_agg(posts.subreddit)[offset(0)] as subreddit
    FROM `fh-bigquery.reddit_comments.2018_10` AS comments
    JOIN `fh-bigquery.reddit_posts.2018_10`  AS posts
    ON posts.id = SUBSTR(comments.link_id, 4) 
    WHERE posts.subreddit = 'Showerthoughts'
    group by 1
    order by 1
)

The idea is to convert that expensive text blob into a md5 hash, and then going about your usual business with the unique entries. You can sort stuff the way you want out of these distinct values.