Search code examples
python-3.xstringfor-loopreadlines

IndexError: list index out of range in a loop of readlines()


I cant figure out why this gives me an 'IndexError: list index out of range'. I am reading from a simple csv.file and trying to get the values out as separated by commas.

with open('project_twitter_data.csv','r') as twf:

    tw = twf.readlines()[1:] # I dont need the very first line

    for i in tw:
        linelst = i.strip().split(",")

        RT = linelst[1]
        RP = linelst[2]

        rows = "{}, {}".format(RT,RP)

my output looks like this


print(tw) # the original strings.
..\nBORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.,4,6\nREASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love,19,0\nHOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a,0,0\n\n"

print (i)
..
BORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.,4,6

REASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love,19,0

HOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a,0,0

print(linelst)
..
['BORDER Terrier puppy. Name is loving and very protective of the people she loves. Name2 is a 3 year old Maltipoo. Name3 is an 8 year old Corgi.', '4', '6']
["REASON they did not rain but they will reign beautifully couldn't asked for a crime 80 years in the Spring Name's Last Love absolutely love", '19', '0']
["HOME surrounded by snow in my Garden. But City Name people musn't: such a good book: RT @twitteruser The Literature of Conflicted Lands after a", '0', '0']
['']

print(rows) 
..
4, 6
19, 0
0, 0


# the error
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-7-f27e87689f41> in <module>
     6         linelst = i.strip().split(",")
     7 #        print(linelst)
----> 8         RT = linelst[1]
     9         RP = linelst[2]
   

IndexError: list index out of range

what am I doing wrong?

I have also noticed that an empty list appeared at the very end of my lists, [' '] after I used strip().split(","). I can delete it with twf.readlines()[1:][:-1] yet the error still persists.. thank you for any advice.


Solution

  • Your final line, after stripping, is empty, so split produces a list of just the empty string.

    Simplest solution is to explicitly skip empty lines:

    with open('project_twitter_data.csv','r') as twf:
    
        next(twf, None)  # Advance past first line without needing to slurp whole file into memory and
                         # slice it, tying peak memory usage to max line size, not size of file
    
        for line in twf:
            line = line.strip()
            if not line:
                continue
            linelst = line.split(",")
    
            # If non-empty, but incomplete lines should be ignored:
            if len(linelst) < 3:
                continue
    
            RT = linelst[1]
            RP = linelst[2]
    
            rows = "{}, {}".format(RT,RP)
    

    Or simpler, using EAFP patterns and the csv module, which you should always be using when dealing with CSV files (the format is a lot more complex than just "split on commas"):

    import csv
    
    with open('project_twitter_data.csv', 'r', newline='') as twf:  # newline='' needed for proper CSV dialect handling
        csvf = csv.reader(twf)
        next(csvf, None)  # Advance past first row without needing to slurp whole file into memory and
                          # slice it, tying peak memory usage to max line size, not size of file
    
        for row in csvf:
            try:
                RT, RP = row[1:3]
            except ValueError:
                continue  # Didn't have enough elements, incomplete line
     
            rows = "{}, {}".format(RT,RP)
    

    Note: In both cases, I made some minor improvements to avoid large temporary lists, and tweaked some minor things to improve readability (naming a str variable i is bad form; i is generally used for indices, or at least integers, and you had a clearer name readily available, so even a placeholder like x would be inappropriate).