I'm trying to find a substring in a frozenset, however I'm a bit out of options.
My data structure is a pandas.dataframe (it's from the association_rules
from the mlxtend
package if you are familiar with that one) and I want to print all the rows where the antecedents (which is a frozenset) include a specific string.
print(rules[rules["antecedents"].str.contains('line', regex=False)])
However whenever I run it, I get an Empty Dataframe.
When I try running only the inner function on my series of rules["antecedents"]
, I get only False values for all entries. But why is that?
Because dataframe.str.*
functions are for string data only. Since your data is not string, it will always be NaN regardless the string representation of it. To prove:
>>> x = pd.DataFrame(np.random.randn(2, 5)).astype("object")
>>> x
0 1 2 3 4
0 -1.17191 -1.92926 -0.831576 -0.0814279 0.099612
1 -1.55183 -0.494855 1.14398 -1.72675 -0.0390948
>>> x[0].str.contains("-1")
0 NaN
1 NaN
Name: 0, dtype: float64
What can you do:
Use apply
:
>>> x[0].apply(lambda x: "-1" in str(x))
0 True
1 True
Name: 0, dtype: bool
So your code should write:
print(rules[rules["antecedents"].apply(lambda x: 'line' in str(x))])
You might want to use 'line' in x
if you mean an exact match on element