Search code examples
pythonpandasnumpydata-sciencetext-classification

Pandas Get rows if value is in column dataframe


I have Information Gain dataframe and tf dataframe. the data looks like this :

Information Gain

    Term      IG
0   alqur     0.641328
1   an        0.641328
2   ayatayat  0.641328
3   bagai     0.641328
4   bantai    0.641328
5   besar     0.641328

Term Frequency

            A   B   A+B
ahli        1   0   1
alas        1   0   1
alqur       0   1   1
an          0   1   1
ayatayat    0   1   1
...        ... ... ...
terus       0   1   1
tuduh       0   1   1
tulis       1   0   1
ulama       1   0   1
upaya       0   1   1

let's say table Information Gain = IG and table tf = TF

I wanted to check if IG.Term is in TF.index then get the row values so it should be like this :

    Term      A    B    A+B
0   alqur     0    1    1
1   an        0    1    1
2   ayatayat  0    1    1
3   bagai     1    0    1
4   bantai    1    1    2
5   besar     1    0    1

NB : I don't need the IG value anymore


Solution

  • Filter by Series.isin with boolean indexing and convert index to column:

    df = TF[TF.index.isin(IG['Term'])].rename_axis('Term').reset_index()
    print (df)
           Term  A  B  A+B
    0     alqur  0  1    1
    1        an  0  1    1
    2  ayatayat  0  1    1
    

    Or use DataFrame.merge with default inner join:

    df = IG[['Term']].merge(TF, left_on='Term', right_index=True)
    print (df)
           Term  A  B  A+B
    0     alqur  0  1    1
    1        an  0  1    1
    2  ayatayat  0  1    1