Search code examples
pythonprawreddit

Exploring comments up to a particular depth in PRAW?


Is there any way to restrict the depth of exploration of comments for a particular post on reddit. We have the replace_more_comments which tries to replace as many more_comments as possible but can we restrict this expansion. Or do I need to write my own version of a dfs over these comments?

Thanks


Solution

  • Since you mention replace_more_comments I'm assuming you are talking about PRAW 3.5.

    Sadly, PRAW doesn't supply information in form comment.depth. It actually doesn't keep this information anywhere.

    If you want to get a set depth of comments, which is low (like first and second level comments only), then you can do it without dfs or bfs.

    submission.replace_more_comments(limit=None,threshold=0)
    for top_level_comment in submission.comments:
        for second_level_comment in top_level_comment.replies:
            print(second_level_comment.body)
    

    If you want non-fixed depth then you are left with your own implementation. But due to how comments are set up and retrieved from reddit api you should use bfs instead of dfs.

    There is also another way, that is avaialble in PRAW 4.0 (it was released yesterday). Here is particular part of docs I'm refering to:

    submission.comments.replace_more(limit=0)
    comment_queue = submission.comments[:]  # Seed with top-level
    while comment_queue:
        comment = comment_queue.pop(0)
        print(comment.body)
        comment_queue.extend(comment.replies)
    

    While it is awesome to be able to do your own breadth-first traversals, CommentForest provides a convenience method, list(), which returns a list of comments traversed in the same order as the code above. Thus the above can be rewritten as:

    submission.comments.replace_more(limit=0)
    for comment in submission.comments.list():
        print(comment.body)
    

    From this you recieve a list of comments in order that bfs would give you.

    [first_level_comment, first_level_comment, first_level_comment, second_level_comment, 
    second_level_comment, third_level_comment, ...]
    

    In this case it is not that complicated to split those based on ids and parent_ids.