so Iḿ working with the reddit api, for some reasons not relevant to the case, I want to work without using the reddit wrapper for this scenario. The code Is very simple actually, it extracts comments and 1 level replies , from a particular post inside a subreddit.
THis is the code of the function,
def getcommentsforpost(subredditname,postid,):
#here we make the request to reddit, and create a python dictionary
#from the resulting json code
reditpath = '/r/' + subredditname + '/comments/' + postid
redditusual = 'https://www.reddit.com'
parameters = '.json?'
totalpath = redditusual + reditpath + parameters
p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'})
result = p.json()
#we are going to be looping a lot through dictionaries, to extract
# the comments and their replies, thus, a list where we will insert
# them.
totallist = []
# the result object is a list with two dictionaries, one with info
#on the post, and the second one with all the info regarding the
#comments and their respective replies, because of this, we first
# process the posts info located in result[0]
a = result[0]["data"]["children"][0]["data"]
abody = a["selftext"]
aauthor = a["author"]
ascore = a["score"]
adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post",
"commentscore" : ascore}
totallist.append(adictionary)
# and now, we start processing the comments, located in result[1]
for i in result[1]["data"]["children"]:
ibody = i["data"]["body"]
iauthor = i["data"]["author"]
iscore = i["data"]["score"]
idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment",
"commentscore" : iscore}
totallist.append(idictionary)
# to clarify, until here, the code works perfectly. No problem
# whatsoever, its exactly in the following section where the
#error happens.
# we create a new object, called replylist,
#that contains a list of dictionaries in every interaction of
#the loop.
replylists = i["data"]["replies"]["data"]["children"]
# we are going to loop through them, in every comment we extract
for j in replylists:
jauthor = j["data"]["author"]
jbody = j["data"]["body"]
jscore = j["data"]["score"]
jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" ,
"commentscore" : jscore }
totallist.append(jdictionary)
# just like we did with the post info and the normal comments,
# we extract and put it in totallist.
finaldf = pd.DataFrame(totallist)
return(finaldf)
getcommentsforpost("Python","a7zss0")
but is while doing that loop for the replies, that the code fails. It returns this error ' string indices must be integers', signaling the error to the variable replylists but, when I execute the code outside of the loop like this
result[1]["data"]["children"][4]["data"]["replies"]["data"]["children"][0]
it works perfectly , it should be the same effect. I believe its treating replylists as a string, instead of a list ( which is its class)
THings I have tried:
I tried making sure that the class of replylists is a list with the type() function, it proofs to be returning "list" but for only 5 interactions of the loop, then it fails with the same error.
I tried making the list loop with for ja in range(0,len(replylists))
and then creating the j
variable as replylists[ja]
. It gave back the same error.
I have been debugging this for two hours, without that fragment of the code the function works perfectly ( it does not return replies in the final dataframe, of course, but it works). Why is this happening? replylists
is a list of dictionaries, not a string, but it gives that weird error.
Here is the reddit documentation for the function we are using : https://www.reddit.com/dev/api#GET_comments_{article}
Libraries to import : requests, pandas as pd, json
I repeat, recommending wrapper is not a solution, I want to work this with json and rest.
Working on this : 'Python version 3.6.5 |Anaconda version 5.2.0,jupyter notebook 5.5.0 '
Thank you in advance. Hope it turns interesting, i'll keep working from here.
I've done some digging and copied your code to a local environment and did some debugging, primarily this:
try:
replylists = i["data"]["replies"]["data"]["children"]
except:
for point in i['data']:
print(point)
exit()
Through this, I saw that in fact, i["data"]
has values (57 of them, actually) and one of the 57 includes replies
, however I did some looking through, and I found that the content of replies is empty:
'replies': ''
is what I see when I directly print out i
for the broken values.
However, all hope is not lost: you've simply forgotten to ignore the iterations where the replies content is empty (''
), since I also ran a check to see how many of your iterations actually failed, and some worked, and some failed (due to the previously mentioned reasoning).
With this, I give you advice to use try
and except
when you error like this, to debug (it's a useful skill) but also, and more on topic to your question, figure out what you'd like to do when the content of replies is empty.
I wish you the best, and I hope this helped.