I cant figure out why this gives me an 'IndexError: list index out of range'. I am reading from a simple csv.file and trying to get the values out as separated by commas.
with open('project_twitter_data.csv','r') as twf:
tw = twf.readlines()[1:] # I dont need the very first line
for i in tw:
linelst = i.strip().split(",")
RT = linelst[1]
RP = linelst[2]
rows = "{}, {}".format(RT,RP)
my output looks like this
print(tw) # the original strings.
..\nBORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.,4,6\nREASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love,19,0\nHOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a,0,0\n\n"
print (i)
..
BORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.,4,6
REASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love,19,0
HOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a,0,0
print(linelst)
..
['BORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.', '4', '6']
["REASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love", '19', '0']
["HOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a", '0', '0']
['']
print(rows)
..
4, 6
19, 0
0, 0
# the error
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-7-f27e87689f41> in <module>
6 linelst = i.strip().split(",")
7 # print(linelst)
----> 8 RT = linelst[1]
9 RP = linelst[2]
IndexError: list index out of range
what am I doing wrong?
I have also noticed that an empty list appeared at the very end of my lists, [' '] after I used strip().split(","). I can delete it with twf.readlines()[1:][:-1] yet the error still persists.. thank you for any advice.
Your final line, after stripping, is empty, so split
produces a list
of just the empty string.
Simplest solution is to explicitly skip empty lines:
with open('project_twitter_data.csv','r') as twf:
next(twf, None) # Advance past first line without needing to slurp whole file into memory and
# slice it, tying peak memory usage to max line size, not size of file
for line in twf:
line = line.strip()
if not line:
continue
linelst = line.split(",")
# If non-empty, but incomplete lines should be ignored:
if len(linelst) < 3:
continue
RT = linelst[1]
RP = linelst[2]
rows = "{}, {}".format(RT,RP)
Or simpler, using EAFP patterns and the csv
module, which you should always be using when dealing with CSV files (the format is a lot more complex than just "split on commas"):
import csv
with open('project_twitter_data.csv', 'r', newline='') as twf: # newline='' needed for proper CSV dialect handling
csvf = csv.reader(twf)
next(csvf, None) # Advance past first row without needing to slurp whole file into memory and
# slice it, tying peak memory usage to max line size, not size of file
for row in csvf:
try:
RT, RP = row[1:3]
except ValueError:
continue # Didn't have enough elements, incomplete line
rows = "{}, {}".format(RT,RP)
Note: In both cases, I made some minor improvements to avoid large temporary lists, and tweaked some minor things to improve readability (naming a str
variable i
is bad form; i
is generally used for indices, or at least integers, and you had a clearer name readily available, so even a placeholder like x
would be inappropriate).