Search code examples
pythonpandasdataframecorrelationanalysis

Pandas bin data based on a column then find correlations for each dataframe


I have the following dataframe,

id       sqft  years_left date         price
0        1400  65         01-01-2021   xxx
1        1200  49         01-01-1950   xxx
.. 
950,000  1600  10         09-05-1990   xx


I want to run a correlation between years_left and price.

But I want to bin the data by the year_left column into 100 bins, between 1 and 999 years, so like

df = df[df[years_left] > 0 && df[years_left] < 10]
df2 = df[df[years_left] > 11 && df[years_left] < 20]
....

I have to manually create each dataframe. Or using a for loop to create the dataframe in a list.

Is there a shortcut to do what I want? To bin the data into how ever many bins I want then find correlations within each bin?


Solution

  • you can create a dataframe dict like this:

    df_dict = {f'df{index}': k for index, (_, k) in enumerate(
        df.groupby(pd.cut(df.years_left, bins=range(0, 1000, 10))))}
    

    And then you can access df's like:

    df_dict['df1'] and so on ...
    

    NOTE: Instead of dict comprehension you can also use list comprehension.