I have a list of 517 tuples. When I use that list to slice my dataframe using .loc, somehow there are 518 rows. If it's important, these are 517 tuples of a multi-index. Visual examination of the result appears to have no obvious header or null rows.
print(submatrix2.shape)
x = list(get_list_of_university_towns().itertuples(index=False, name=None))
print(len(x))
univ_matrix = submatrix2.loc[x,]
print(univ_matrix.shape)
Outputs:
(10730, 1)
517
(518,1)
What could be causing this mismatch?
You probably have a duplicate index, which allows your final shape to be greater than your passed list.
Reproducible example:
df = pd.DataFrame({'vals':["a", "b", "c", "d"],
'n':[0,1,1,2]})
df = df.set_index('n')
vals
n
0 a
1 b
1 c
2 d
Now
>>> x=[0,1,2];len(x)
3
>>> df.loc[x,:].shape
(4, 1)