Search code examples
pythonpandasindexingsplitindex-error

Trying To Store an Index From df['column_name'].str.split(' ')[index] is Throwing an Index Error in Pandas


I am working with a data set from kaggle on NBA allstars (https://www.kaggle.com/fmejia21/nba-all-star-game-20002016) [link for anyone who wants to run it themselves]. The data set looks like this:

In [3]: df1.head(3)
Out[3]: 
   Year         Player Pos  ...                       Selection Type   NBA Draft Status    Nationality
0  2016  Stephen Curry   G  ...  Western All-Star Fan Vote Selection  2009 Rnd 1 Pick 7  United States
1  2016   James Harden  SG  ...  Western All-Star Fan Vote Selection  2009 Rnd 1 Pick 3  United States
2  2016   Kevin Durant  SF  ...  Western All-Star Fan Vote Selection  2007 Rnd 1 Pick 2  United States

[3 rows x 9 columns]

What I am trying to do is grab the draft position under the 'NBA Draft Status' column and store it in another column, so I begin just by checking the split:

In [4]: df1['NBA Draft Status'].str.split(' ')
Out[4]: 
0       [2009, Rnd, 1, Pick, 7]
1       [2009, Rnd, 1, Pick, 3]

So it seems simple enough; just grab the item in the fourth position. If it's a second round pick then add 30 to that number. I use this:

In [5]: positions = []
   ...: for draft in df1['NBA Draft Status']:
   ...:     if 'Rnd 2' in draft:
   ...:         position = draft.split(' ')[4]
   ...:         position = int(position) + 30
   ...:         positions.append(position)
   ...:     else:
   ...:         position = draft.split(' ')[4]
   ...:         position = int(position)
   ...:         positions.append(position)

and it throws an index error:

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-0946ed392ea2> in <module>
      6         positions.append(position)
      7     else:
----> 8         position = draft.split(' ')[4]
      9         position = int(position)
     10         positions.append(position)

IndexError: list index out of range

Okay... now this is where the question is; why is it out of range? While trying to investigate what the issue is, I found that I can print this index but for whatever reason can't append it to an empty list. This works:

In [6]: for draft in df1['NBA Draft Status']:
   ...:     print(draft.split(' ')[4])
   ...:     break
   ...: 
7

Can someone explain to me what is going on? I know this is rather wordy but I didn't know how else to convey the problem without giving some backdrop to the data set.


Solution

  • The problem is you have some values in df1['NBA Draft Status'] which only have 3 spaces in them, so when you call .split() on them the resultant list is 4 items long, which with 0 indexing is causing your index error.

    df1['length'] = df1['NBA Draft Status'].apply(lambda draft: len(draft.split()))
    df2 = df1.loc[df1.length == 4,:]
    df2['NBA Draft Status']
    Out[74]: 
    309    1996 NBA Draft, Undrafted
    334    1996 NBA Draft, Undrafted
    346    1998 NBA Draft, Undrafted
    348    1996 NBA Draft, Undrafted
    360    1996 NBA Draft, Undrafted
    371    1998 NBA Draft, Undrafted
    Name: NBA Draft Status, dtype: object
    

    Drop them with: df1 = df1.loc[df1.length == 5,:], and then rerun your code. It will work.