Search code examples
pythonpandasfilterconditional-statementslines-of-code

Build expression for pandas loc to filter data using a certain column according to length of a list


I am pretty new to python programming and I want to use Pandas .loc to filter data using a certain column according to the length of a products' list.

I have the (condensed) list:

products_selected = ["Apple", "Banana", "Strawberry"]

I produced the following very poor code (condensed) to reach my goal for the column PRODUCT in the Pandas DataFrame:

if len(products_selected) == 1:
    data = data.loc[(data.PRODUCT == products_selected[0])]
   
elif len(products_selected) == 2:
    data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1])]

elif len(products_selected) == 3:
    data = data.loc[(data.PRODUCT == products_selected[0]) | (data.PRODUCT == products_selected[1]) | (data.PRODUCT == products_selected[2])]

How can I do this the pythonic way?

And what's more - independent to the length of the list without having to expand my poor coding manually?

I can't use reduce() or something like that, it should be done without additional functions except pandas or numpy.

It looks so easy but due to my limited coding experience I didn't manage it.


Solution

  • Slicing may help.

    https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#indexing-slicing-with-labels

    [in]:
    import pandas
    data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})
    
    products_selected = ["Apple"]
    print(data[:len(products_selected)])
    
    products_selected = ["Apple", "Banana"]
    print(data[:len(products_selected)])
    
    products_selected = ["Apple", "Banana", "Strawberry"]
    print(data[:len(products_selected)])
    
    [out]:
      PRODUCT
    0   Apple
      PRODUCT
    0   Apple
    1  Banana
          PRODUCT
    0       Apple
    1      Banana
    2  Strawberry
    

    ... but it makes no assumption about returning the same values from data that are in products_selected.

    For example:

    [in]:
    import pandas
    data = pandas.DataFrame({'PRODUCT': ["Apple", "Banana", "Strawberry"]})
    
    products_selected = ["Apple", "Strawberry"]
    data[:len(products_selected)]
    
    [out]:
        PRODUCT
    0   Apple
    1   Banana
    

    There is also a reference in your example to countries_selected. I made the assumption that it was a typo and should have also been products_selected.