Search code examples
pythonpandaslistlist-comprehensionany

Creating a Conditional Numerical List of Features in Python


I am trying to create a numerical index list that pulls from a list of columns given that any string in a feature list is contained in the name of those columns.

I have attempted to use a list comprehension with conditional statements. However, the code gives me a Type Error "in requires string as left operand, not bool".

import pandas as pd
feature_list = ['a', 'b']

x = pd.DataFrame({"data_a":[1,2,3], "data_b":[1,2,3], "data_c":[1,2,3]})

numerical_index_list = [x.columns.get_loc(a) for a in [b for b in list(x.columns) if any(c for c in feature_list) in b]]

Would anybody be able to help me get a conditional list comprehension that will give me a list of columns that contain the strings a and b ["data_a", "data_b"]?


Solution

  • You could use feature_list as a set and see if that and the column name intersect. This seems to be the approach you're trying for; although I believe this is faulty as the word data has a in it therefore all pass that test.

    features = set(feature_list)
    cols = x.columns
    [cols.get_loc(c) for c in cols if features.intersection(c)]
    #[0, 1, 2]
    

    Maybe use a better method of determining if column is a subset of feature_list? Something like if c[-1] in features? This way only the first 2 pass and the last won't as c isn't in feature_list.

    [cols.get_loc(c) for c in cols if c[-1] in feature_list]
    #[0, 1]
    

    Or more relevant to your comment just remove "data_" from the column name and use the first method.

    [cols.get_loc(c) for c in cols if features.intersection('_'.join(c.split('_')[1:]))]
    #[0, 1]