I am trying to recreate the below operation in kolas, In pandas this works when i try the same in koalas it throws an error.
Operation tried in Pandas:
df = pd.DataFrame({'foo':['a','b','c','d','e'], 'bar':['1', '2', '3','4','5']})
df1 = pd.DataFrame({'foo':['a','b','c'], 'bar':['1', '2', '3']})
ci = [4,32,12,1]
df[df.index.get_level_values(0).isin(ci)]
Output:
foo bar 1 b 2 4 e 5
Operation tried in Koalas:
df = ks.DataFrame({'foo':['a','b','c','d','e'], 'bar':['1', '2', '3','4','5']})
df1 = ks.DataFrame({'foo':['a','b','c'], 'bar':['1', '2', '3']})
ci = [4,32,12,1]
df[df.index.get_level_values(0).isin(ci)]
Output:
PandasNotImplementedError: The method pd.Index.__iter__()
is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.
Looks like Index.get_level_values()
is using __iter__()
behind the scenes, which is not supported in Koalas.
Couple of thoughts:
Why the need to use get_level_values()
at all?
df[df.index.isin(ci)]
works just as well.
The "proper" way to index with missing labels would be to use .reindex()
. It would fill the rows that are missing from the new index with NaNs, which you'll have to drop:
new_df = df.reindex(index=ci).dropna()