I'm getting the IndexError: list index out of range on the following line of code:
if tweetSplit[i] != "":
in my code:
tweetSplit = tweet.split(' ')
for i in range(len(tweetSplit)):
#print (i)
if not tweetSplit:
break
if tweetSplit[i] != "":
#print (tweetSplit[i])
#print (tweetSplit[i][:1])
if tweetSplit[i][:1] == '@':
del tweetSplit[i]
I thought by checking if tweetSplit is empty using "if not tweetSplit" I wouldn't run into the out of range error. Here's the full error:
Traceback (most recent call last):
File "parseTweets.py", line 55, in <module>
if tweetSplit[i] != "":
IndexError: list index out of range
Your test doesn't really do much good.
Sure if not tweetsplit:
checks if tweetsplit
is empty. But it doesn't check if tweetsplit
is at least i+1
elements long.
And, because you're deleting from tweetsplit
in the middle of the loop, if you delete even one element, then, by the end, it will be shorter than i+1
, and raise an IndexError
.
This is one of the reasons you should never delete or insert in the middle of looping over any collection. (But not the only one—for example, when you delete element i
, that moves all the others up one slot, and then you check the new element i+1
, which was originally i+2
… which means you missed one.)
If you want to build a collection of all tweets that match some rule, it's much easier to do that by building a new list:
goodTweets = []
for tweet in tweetSplit:
if tweet[:1] != '@':
goodTweets.append(tweet)
Or:
goodTweets = [tweet for tweet in tweetSplit if tweet[:1] != '@']
If you really do need to mutate tweetSplit
for some reason, there are tricks you can use, but they're all a bit ugly.
Build a new list, then change tweetSplit
into that list:
tweetSplit[:] = [tweet for tweet in tweetSplit if tweet[:1] != '@']
Or, do it without building the new list explicitly:
tweetSplit[:] = (tweet for tweet in tweetSplit if tweet[:1] != '@')
Or iterate backward. While len(tweetSplit)
may change as you delete, 0 never does. (And while the positions of everything from i:
may change, the positions of :i
never do.)
for i in range(len(tweetSplit))[::-1]:
if tweetSplit[i][:1] == '@':
del tweetSplit[i]
However, if you're trying to do this in-place as a performance optimization, all of these are usually slower. The only thing likely to be faster is something like this:
i = 0
while i < len(tweetSplit):
if tweetSplit[i][:1] == '@':
tweetSplit[i] = tweetSplit[-1]
tweetSplit.pop()
else:
i += 1