Search code examples
pythonpandasperformanceindexingprocessing-efficiency

Getting a long list of specific elements from a list of indexes and column coordinates in Pandas


data=pd.DataFrame([[1,2,3],[21,23,24],[31,32,33]])

i=[0,1,2] # this is same as the index

y=[1,2,0]

data.iloc[x,y] gives me a 3x3 df, which I do not need.

I need to run this on a large df and would like to get the ELEMENTS (1,1) , (2,2) , (3,0) of the dataframe: 2, 24,31 . So I'd like to have the most efficient solution. I can obviously use a for loop with iterrows or even something like: data.apply(lambda x: x.iloc[y[int(x.name)]],axis=1).values

Is the for/apply already the fastest solution?

Isn't there a more direct way of getting only the elements, not a slice, of a df when you have a list(/series/df) with index,column coordinates?

Thanks


Solution

  • Based on the documentation, it seem that data.to_numpy()[x,y] is a reasonable approach.