I have a dataframe which looks like this:
import pandas as pd
saf_data = {'col1': ['U1', 'U2', 'U3', 'U4'], 'col2': ['1', '2|6', '4a|6a', '6b']}
saf_df = pd.DataFrame(saf_data)
I want to do the following
col1
that contain the "text" 6
in col2
So the expected outcome should contain ['U2', 'U3', 'U4']
or be a dataframe that looks like this:
col1 col2
0 U2 2|6
1 U3 4a|6a
2 U4 6b
Can someone help me how I can do that? Please I am looking for the most efficient and general code, because my dataframe is quite large.
IIUC str.contains
saf_df[saf_df.col2.str.contains('6')]
Out[51]:
col1 col2
1 U2 2|6
2 U3 4a|6a
3 U4 6b
Since you mentioned your df
is quit large, I will recommend using for loop
here, see the timing below
%timeit saf_df[saf_df.col2.str.contains('6')]
10 loops, best of 3: 20.1 ms per loop
%timeit saf_df[['6' in x for x in saf_df.col2]]
100 loops, best of 3: 3.14 ms per loop