Note: this makes use of a separate library called PRAW, which isn't critical to understanding the problem, and the ambiguous/related code has been annotated in my example below with # !!!
to signify that the code is only necessary insofar as your need to understand that the code produces a list of dicts.
*Problem is a couple layers deep--will try my best to explain:
I have JSON data in data.json
, which looks like so:
{
"USA":[
{"shortlink":"https://short/74h13v"},
{"responses":[]}
],
"Vietnam":[
{"shortlink":"https://short/74gyn4"},
{"responses":[]}
],
"Italy":[
{"shortlink":"https://short/74h3i9"},
{"responses":[]}
]
}
In the module(scraper.py
), I have additional data which will come in the form of comment.id="39dn28"
, comment.body="this is a comment"
I am attempting to insert multiple comment.id
and comment.body
instances into the []
attached to responses
so that it looks like so:
{"responses": [
{"39dn28": "this is my response"},
{"39k229": "I'm another response"},
{"35sn64": "another comment"}
]}
Where it gets especially tricky for me is when I have to consider that each group of comments matches to the ID of a single country(or, 'shortlink'). And I've extracted the shortlink ID with shortlinks = [data[link][0]["shortlink"][-6:] for link in data]
which results in ['74h3i9', '74gyn4', '74h13v']
.
Now I need to match each group of comments to its corresponding shortlink, and input those comments where they correctly belong.
Here is what I've tried so far, for insight into what I have and what I'm trying to accomplish:
with open("data.json", "r") as f:
data = json.load(f)
shortlinks = [data[link][0]["shortlink"][-6:] for link in data]
for sl_id in shortlinks:
# !!! (the following code produces a list of comment dicts.)
submission = reddit.submission(id=sl_id)
submission.comments.replace_more(limit=0)
cmt_data = [{comment.id: comment.body} for comment in submission.comments.list()]
for i in data:
if sl_id in data[i][0]["shortlink"]:
data[i][0]["responses"] = cmt_data
print(data)
This almost works.. For some reason I am also returned additional blank 'responses': []
and additional shortlinks.
Cannot seem to figure it out. Help much, much appreciated. I am open to alternate ways to accomplish it, and alternate ways to store the data(maybe not a list of dicts., etc.).
if you want to get sth like this:
{
"USA":[
{"shortlink":"https://short/74h13v"},
{"responses":[{},{},{}]}],...]
}
I think, it should be like this:
for sl_id in shortlinks:
# !!! (the following code produces a list of comment dicts.)
submission = reddit.submission(id=sl_id)
submission.comments.replace_more(limit=0)
cmt_data = [{comment.id: comment.body} for comment in submission.comments.list()]
for i in data:
if sl_id in data[i][0]["shortlink"]:
data[i][1]["responses"] = cmt_data