Search code examples
python-3.xpandaspandas-groupbysklearn-pandas

How to select the rows where last column of dataframe contains specific value


Duration Protocol Direction Label
12        tcp     bi        normal-V45
2         udp     one       Botnet-45
2         icmp    bi        Botnet-68
3         tcp     one       normal-V73
5         udp     bi        Background-tcp
3         icmp    one       Background

I want to select the rows that have last column(label) as either normal or Botnet. I have to check the condition if Label contains normal/Botnet , (here normal-V45 and normal-V73 are considered normal, similar concept for botnet). So output should be:

Duration Protocol Direction Label
12        tcp     bi        normal-V45
2         udp     one       Botnet-45
2         icmp    bi        Botnet-68
3         tcp     one       normal-V73

I use the following in pandas, but all the data comes in csv. Help appreciated.Waiting. Thanks a lot in advance: data1 is the dataframe where all data is, [~data1.iloc[:,-1].str is for selecting last column.

datagrouped = data1.loc[~data1.iloc[:,-1].str == 'Botnet']


Solution

  • Use .str.contans with regex and boolean indexing:

    df[df.Label.str.contains(r'normal|Botnet')]
    

    Output:

       Duration Protocol Direction       Label
    0        12      tcp        bi  normal-V45
    1         2      udp       one   Botnet-45
    2         2     icmp        bi   Botnet-68
    3         3      tcp       one  normal-V73