Search code examples
pythondelindex-error

Python - IndexError: list index out of range even though checking if empty


I'm getting the IndexError: list index out of range on the following line of code:

if tweetSplit[i] != "":

in my code:

tweetSplit = tweet.split(' ') 

for i in range(len(tweetSplit)):
    #print (i)
    if not tweetSplit:
        break
    if tweetSplit[i] != "":
         #print (tweetSplit[i])
         #print (tweetSplit[i][:1])
        if tweetSplit[i][:1] == '@':
            del tweetSplit[i]

I thought by checking if tweetSplit is empty using "if not tweetSplit" I wouldn't run into the out of range error. Here's the full error:

Traceback (most recent call last):
  File "parseTweets.py", line 55, in <module>
     if tweetSplit[i] != "":
IndexError: list index out of range

Solution

  • Your test doesn't really do much good.

    Sure if not tweetsplit: checks if tweetsplit is empty. But it doesn't check if tweetsplit is at least i+1 elements long.

    And, because you're deleting from tweetsplit in the middle of the loop, if you delete even one element, then, by the end, it will be shorter than i+1, and raise an IndexError.

    This is one of the reasons you should never delete or insert in the middle of looping over any collection. (But not the only one—for example, when you delete element i, that moves all the others up one slot, and then you check the new element i+1, which was originally i+2… which means you missed one.)


    If you want to build a collection of all tweets that match some rule, it's much easier to do that by building a new list:

    goodTweets = []
    for tweet in tweetSplit:
        if tweet[:1] != '@':
            goodTweets.append(tweet)
    

    Or:

    goodTweets = [tweet for tweet in tweetSplit if tweet[:1] != '@']
    

    If you really do need to mutate tweetSplit for some reason, there are tricks you can use, but they're all a bit ugly.

    Build a new list, then change tweetSplit into that list:

    tweetSplit[:] = [tweet for tweet in tweetSplit if tweet[:1] != '@']
    

    Or, do it without building the new list explicitly:

    tweetSplit[:] = (tweet for tweet in tweetSplit if tweet[:1] != '@')
    

    Or iterate backward. While len(tweetSplit) may change as you delete, 0 never does. (And while the positions of everything from i: may change, the positions of :i never do.)

    for i in range(len(tweetSplit))[::-1]:
        if tweetSplit[i][:1] == '@':
            del tweetSplit[i]
    

    However, if you're trying to do this in-place as a performance optimization, all of these are usually slower. The only thing likely to be faster is something like this:

    i = 0
    while i < len(tweetSplit):
        if tweetSplit[i][:1] == '@':
            tweetSplit[i] = tweetSplit[-1]
            tweetSplit.pop()
        else:
            i += 1