What im looking to achieve: I was wondering what would be the best way to find any value of a list in a pandas Dataframe, and then to separate the row where the value is found.
Here is what i Tried
def searchXlsx(xlsx_file: str, numbers: list) -> list:
with open(os.path.realpath(xlsx_file), "rb") as i:
i = pd.read_excel(i)
print(i[i.isin(numbers)])
i would like to return the row it is located
You can use .isin
:
import numpy as np
import pandas as pd
searchvals = np.array([1,45,87,9]) #The values to find
#Create a random dataframe of integers with 100k rows and 10 columns
size = (100000, 10)
columns = [f"col_{x}" for x in range(size[1])]
df = pd.DataFrame(data=np.random.randint(low=0, high=1000, size=size), columns=columns)
matches = df.isin(searchvals).any(axis=1) #A series of True/False where each value correspond to a row
df2 = df.loc[matches] #Extract the matching rows into a new dataframe
df2.head()
# col_0 col_1 col_2 col_3 col_4 col_5 col_6 col_7 col_8 col_9
# 34 793 893 435 312 812 87 990 552 496 397
# 47 818 902 478 479 736 83 135 45 180 224
# 55 9 750 118 910 205 866 617 232 506 755
# 94 326 303 868 9 583 310 668 614 720 50
# 99 167 148 586 570 196 473 829 87 266 194