python pandas dataframe correlation analysis

Pandas bin data based on a column then find correlations for each dataframe

I have the following dataframe,

id       sqft  years_left date         price
0        1400  65         01-01-2021   xxx
1        1200  49         01-01-1950   xxx
.. 
950,000  1600  10         09-05-1990   xx

I want to run a correlation between years_left and price.

But I want to bin the data by the year_left column into 100 bins, between 1 and 999 years, so like

df = df[df[years_left] > 0 && df[years_left] < 10]
df2 = df[df[years_left] > 11 && df[years_left] < 20]
....

I have to manually create each dataframe. Or using a for loop to create the dataframe in a list.

Is there a shortcut to do what I want? To bin the data into how ever many bins I want then find correlations within each bin?

Solution

you can create a dataframe dict like this:

df_dict = {f'df{index}': k for index, (_, k) in enumerate(
    df.groupby(pd.cut(df.years_left, bins=range(0, 1000, 10))))}

And then you can access df's like:

df_dict['df1'] and so on ...

NOTE: Instead of dict comprehension you can also use list comprehension.