Search code examples
pythonsubset

How to create a new df from multiple columns of an existing df while returning results for one column based on specified criteria of another?


My question has two part(s): A) I want to use the map() or other function to replace '1.0' and '0.0' in one column of a df -- corresponding to 'Rainy' and 'Sunny', respectively -- to read 'Rainy' and 'Sunny in a new column in the same df.

...in a separate, but related question, B) I want to create a new df from an existing one (ref. above), but only return observations from column 'A' ('Sales') when specified criterium is met in the same observation row in column 'B' ('Weather (Rainy 1, Sunny 0)') -- either 'Rainy' or 'Sunny':

*Here's the actual question for which I want to solve: *

1.Using the Dataframe_All_Locations, create two new dataframes called Rainy and Sunny where you'll filter on the Weather Status Accordingly.

The two columns of interest are 'Sales' and 'Weather (Rainy 1, Sunny 0)'. The 'Weather (Rainy 1, Sunny 0)' column essentially contains Boolean values -- '1.0' corresponding to 'Rainy' and '0.0' corresponding to 'Sunny'.

I attempted the following code to solve part A:

perth_df['Actual Weather'] = perth_df['Weather (Rainy 1, Sunny 0)'].str.contains([1.0, 0.0]) == ['Rainy', 'Sunny']

I attempted the following code to solve part B:

rainy_df = dataframe_all_locations[['Sales', 'Weather (Rainy 1, Sunny 0)' == 1.0]]

rainy_df = dataframe_all_locations('Sales & Weather & (Rainy 1, Sunny 0) == 1.0')

sunny_df = dataframe_all_locations('Sales & Weather & (Rainy 1, Sunny 0) == 0.0')

I might need to use np.arrange or np.where, but I'm lost as to where to start.

These are all throwing errors in my notebook.


Solution

    1. You said you want to use map() but I didn't see your attempt about using it in part A.
    2. try to understand how filter/slicing works in pandas DataFrame
    3. Search the error to find the solutions, it really helps.
    import pandas as pd
    
    # demo df from your description
    df = pd.DataFrame([[150,0.0],[200,1.0],[100,1.0],[250,0.0]], columns = ['Sales','Weather (Rainy 1, Sunny 0)'])
    
    # part A
    # first create a dict for mapping, {you_target:desire_mapping}
    mapping_dict = {1:'Rainy',0:'Sunny'}
    # apply the map to the column and save it to a new column
    df['Actual Weather'] = df['Weather (Rainy 1, Sunny 0)'].map(mapping_dict)
    
    # part B
    # create a series of boolean for your criterium
    rainy_flag = df['Actual Weather'] == 'Rainy'
    # using .loc for filter, see the document for detail
    rainy_sales = df.loc[rainy_flag,'Sales']
    # using ~ to inverse the boolean
    sunny_sales = df.loc[~rainy_flag,'Sales']