Search code examples
pythonmulti-indexnumpy-slicing

Python: Slicing a dataframe by a list returns a longer list than expected


I have a list of 517 tuples. When I use that list to slice my dataframe using .loc, somehow there are 518 rows. If it's important, these are 517 tuples of a multi-index. Visual examination of the result appears to have no obvious header or null rows.

print(submatrix2.shape)
x = list(get_list_of_university_towns().itertuples(index=False, name=None))
print(len(x))
univ_matrix = submatrix2.loc[x,] 
print(univ_matrix.shape)

Outputs:

(10730, 1)
517
(518,1)

What could be causing this mismatch?


Solution

  • You probably have a duplicate index, which allows your final shape to be greater than your passed list.

    Reproducible example:

    df = pd.DataFrame({'vals':["a", "b", "c", "d"],
                       'n':[0,1,1,2]})
    
    df = df.set_index('n')
    
    
        vals
    n   
    0   a
    1   b
    1   c
    2   d
    

    Now

    >>> x=[0,1,2];len(x)
    3
    >>> df.loc[x,:].shape
    (4, 1)