I am trying to create a numerical index list that pulls from a list of columns given that any string in a feature list is contained in the name of those columns.
I have attempted to use a list comprehension with conditional statements. However, the code gives me a Type Error "in requires string as left operand, not bool".
import pandas as pd
feature_list = ['a', 'b']
x = pd.DataFrame({"data_a":[1,2,3], "data_b":[1,2,3], "data_c":[1,2,3]})
numerical_index_list = [x.columns.get_loc(a) for a in [b for b in list(x.columns) if any(c for c in feature_list) in b]]
Would anybody be able to help me get a conditional list comprehension that will give me a list of columns that contain the strings a and b ["data_a", "data_b"]?
You could use feature_list
as a set
and see if that and the column name intersect. This seems to be the approach you're trying for; although I believe this is faulty as the word data has a in it therefore all pass that test.
features = set(feature_list)
cols = x.columns
[cols.get_loc(c) for c in cols if features.intersection(c)]
#[0, 1, 2]
Maybe use a better method of determining if column is a subset of feature_list
? Something like if c[-1] in features
? This way only the first 2 pass and the last won't as c isn't in feature_list
.
[cols.get_loc(c) for c in cols if c[-1] in feature_list]
#[0, 1]
Or more relevant to your comment just remove "data_"
from the column name and use the first method.
[cols.get_loc(c) for c in cols if features.intersection('_'.join(c.split('_')[1:]))]
#[0, 1]