Search code examples
pythonjsonpython-3.xdictionarypraw

How to assign multiple key:value dicts into an already established dictionary, according to specific parameter


Note: this makes use of a separate library called PRAW, which isn't critical to understanding the problem, and the ambiguous/related code has been annotated in my example below with # !!! to signify that the code is only necessary insofar as your need to understand that the code produces a list of dicts.

*Problem is a couple layers deep--will try my best to explain:

I have JSON data in data.json, which looks like so:

 {
   "USA":[
      {"shortlink":"https://short/74h13v"},
      {"responses":[]}
   ],
   "Vietnam":[
      {"shortlink":"https://short/74gyn4"},
      {"responses":[]}
   ],
   "Italy":[
      {"shortlink":"https://short/74h3i9"},
      {"responses":[]}
   ]
}  

In the module(scraper.py), I have additional data which will come in the form of comment.id="39dn28", comment.body="this is a comment"

I am attempting to insert multiple comment.id and comment.body instances into the [] attached to responses so that it looks like so:

{"responses": [
        {"39dn28": "this is my response"},
        {"39k229": "I'm another response"},
        {"35sn64": "another comment"} 
 ]}  

Where it gets especially tricky for me is when I have to consider that each group of comments matches to the ID of a single country(or, 'shortlink'). And I've extracted the shortlink ID with shortlinks = [data[link][0]["shortlink"][-6:] for link in data] which results in ['74h3i9', '74gyn4', '74h13v'].
Now I need to match each group of comments to its corresponding shortlink, and input those comments where they correctly belong.

Here is what I've tried so far, for insight into what I have and what I'm trying to accomplish:

with open("data.json", "r") as f:
    data = json.load(f)

shortlinks = [data[link][0]["shortlink"][-6:] for link in data]

for sl_id in shortlinks:
    # !!! (the following code produces a list of comment dicts.)
    submission = reddit.submission(id=sl_id)
    submission.comments.replace_more(limit=0)
    cmt_data = [{comment.id: comment.body} for comment in submission.comments.list()]

    for i in data:
        if sl_id in data[i][0]["shortlink"]:
            data[i][0]["responses"] = cmt_data

print(data)  

This almost works.. For some reason I am also returned additional blank 'responses': [] and additional shortlinks.

Cannot seem to figure it out. Help much, much appreciated. I am open to alternate ways to accomplish it, and alternate ways to store the data(maybe not a list of dicts., etc.).


Solution

  • if you want to get sth like this:

    {
       "USA":[
      {"shortlink":"https://short/74h13v"},
      {"responses":[{},{},{}]}],...]
    
    } 
    

    I think, it should be like this:

    for sl_id in shortlinks:
    # !!! (the following code produces a list of comment dicts.)
    submission = reddit.submission(id=sl_id)
    submission.comments.replace_more(limit=0)
    cmt_data = [{comment.id: comment.body} for comment in submission.comments.list()]
    
    for i in data:
        if sl_id in data[i][0]["shortlink"]:
            data[i][1]["responses"] = cmt_data