Search code examples
pythonjsonapireddit

My code is treating a list of dictionaries, like a string, typeerror : TypeError: string indices must be integers


so Iḿ working with the reddit api, for some reasons not relevant to the case, I want to work without using the reddit wrapper for this scenario. The code Is very simple actually, it extracts comments and 1 level replies , from a particular post inside a subreddit.

THis is the code of the function,

def getcommentsforpost(subredditname,postid,):

    #here we make the request to reddit, and create a python dictionary   
    #from the resulting json code


    reditpath = '/r/' + subredditname + '/comments/' + postid
    redditusual = 'https://www.reddit.com'
    parameters = '.json?'
    totalpath = redditusual + reditpath + parameters
    p = requests.get(totalpath, headers = {'User-agent' : 'Chrome'})
    result = p.json()

    #we are going to be looping a lot through dictionaries, to extract
    # the comments and their replies, thus, a list where we will insert  
    # them.
    totallist = [] 

    # the result object is a list with two dictionaries, one with info 
    #on the post, and the second one with all the info regarding the 
    #comments and their respective replies, because of this, we first 
    # process the posts info located in result[0]


    a = result[0]["data"]["children"][0]["data"]
    abody = a["selftext"]
    aauthor = a["author"]
    ascore = a["score"]
    adictionary = {"commentauthor" : aauthor , "comment" : abody , "Type" : "Post",
                       "commentscore" : ascore}

    totallist.append(adictionary)




    # and now, we start processing the comments, located in result[1]

    for i in result[1]["data"]["children"]:

        ibody = i["data"]["body"]
        iauthor = i["data"]["author"]
        iscore = i["data"]["score"]



        idictionary = {"commentauthor" : iauthor , "comment" : ibody , "Type" : "post_comment",
                       "commentscore" : iscore}

        totallist.append(idictionary)

       # to clarify, until here, the code works perfectly. No problem 
       # whatsoever, its exactly in the following section where  the 
       #error happens. 

       # we create a new object, called replylist, 
        #that contains a  list of dictionaries in every interaction of 
        #the loop. 

        replylists =  i["data"]["replies"]["data"]["children"]

        # we are going to loop through them, in every comment we extract


        for j in replylists:
            jauthor = j["data"]["author"]
            jbody = j["data"]["body"]
            jscore = j["data"]["score"]


            jdictionary = {"commentauthor" : jauthor , "comment" : jbody , "Type" : "comment_reply" , 
                           "commentscore" : jscore } 
            totallist.append(jdictionary)

        # just like we did with the post info and the normal comments,
         # we extract and put it in totallist. 



        finaldf = pd.DataFrame(totallist)



    return(finaldf)

getcommentsforpost("Python","a7zss0")

but is while doing that loop for the replies, that the code fails. It returns this error ' string indices must be integers', signaling the error to the variable replylists but, when I execute the code outside of the loop like this

result[1]["data"]["children"][4]["data"]["replies"]["data"]["children"][0]

it works perfectly , it should be the same effect. I believe its treating replylists as a string, instead of a list ( which is its class)

THings I have tried:

I tried making sure that the class of replylists is a list with the type() function, it proofs to be returning "list" but for only 5 interactions of the loop, then it fails with the same error.

I tried making the list loop with for ja in range(0,len(replylists)) and then creating the j variable as replylists[ja]. It gave back the same error.

I have been debugging this for two hours, without that fragment of the code the function works perfectly ( it does not return replies in the final dataframe, of course, but it works). Why is this happening? replylists is a list of dictionaries, not a string, but it gives that weird error.

Here is the reddit documentation for the function we are using : https://www.reddit.com/dev/api#GET_comments_{article}

Libraries to import : requests, pandas as pd, json

I repeat, recommending wrapper is not a solution, I want to work this with json and rest.

Working on this : 'Python version 3.6.5 |Anaconda version 5.2.0,jupyter notebook 5.5.0 '

Thank you in advance. Hope it turns interesting, i'll keep working from here.


Solution

  • I've done some digging and copied your code to a local environment and did some debugging, primarily this:

    try:
        replylists =  i["data"]["replies"]["data"]["children"]
    except:
        for point in i['data']:
            print(point)
        exit()
    

    Through this, I saw that in fact, i["data"] has values (57 of them, actually) and one of the 57 includes replies, however I did some looking through, and I found that the content of replies is empty:

    'replies': '' is what I see when I directly print out i for the broken values.

    However, all hope is not lost: you've simply forgotten to ignore the iterations where the replies content is empty (''), since I also ran a check to see how many of your iterations actually failed, and some worked, and some failed (due to the previously mentioned reasoning).

    With this, I give you advice to use try and except when you error like this, to debug (it's a useful skill) but also, and more on topic to your question, figure out what you'd like to do when the content of replies is empty.

    I wish you the best, and I hope this helped.