Search code examples
pythonfilterrow

Python: I need to extract certain values from a data frame and create a new dataframe


Sorry for the ask but I'm a python-noob and I need a help.

So, I have this csv (https://www.kaggle.com/jtrofe/beer-recipes) and I need to extract certain values.

I want to create a new data frame with same columns and values, but I want to extract from the columns "Style" only the "American IPA, American Pale Ale, Saison, American Light Lager e American Amber Ale". And create a data frame with this.

Someone can help me?

Thanks!


Solution

  • I use the .iloc indexing and the boolean series generator .isin:

    import pandas as pd
    
    # Read in the full data set, check its size
    original_df = pd.read_csv('recipeData.csv', encoding='latin-1')
    print(original_df.size)  # 1698803
    
    # Store your desired styles for filtering in a python list
    styles_list = "American IPA, American Pale Ale, Saison, American Light Lager, American Amber Ale".split(', ')
    
    # Filter using .loc and a boolean mask (checking if each 'Style' value is in your list)
    new_df = original_df.loc[original_df['Style'].isin(styles_list)]
    print(new_df.size)  # 608419