Duration Protocol Direction Label
12 tcp bi normal-V45
2 udp one Botnet-45
2 icmp bi Botnet-68
3 tcp one normal-V73
5 udp bi Background-tcp
3 icmp one Background
I want to select the rows that have last column(label) as either normal or Botnet. I have to check the condition if Label contains normal/Botnet , (here normal-V45 and normal-V73 are considered normal, similar concept for botnet). So output should be:
Duration Protocol Direction Label
12 tcp bi normal-V45
2 udp one Botnet-45
2 icmp bi Botnet-68
3 tcp one normal-V73
I use the following in pandas, but all the data comes in csv. Help appreciated.Waiting. Thanks a lot in advance: data1 is the dataframe where all data is, [~data1.iloc[:,-1].str is for selecting last column.
datagrouped = data1.loc[~data1.iloc[:,-1].str == 'Botnet']
Use .str.contans
with regex and boolean indexing:
df[df.Label.str.contains(r'normal|Botnet')]
Output:
Duration Protocol Direction Label
0 12 tcp bi normal-V45
1 2 udp one Botnet-45
2 2 icmp bi Botnet-68
3 3 tcp one normal-V73