Search code examples
pythondataframefinance

I want to save the mean (by row) of different set of dataframe columns and store them in a new dataframe


For doing so, I have a list of lists (which are my clusters), for example:

asset_clusts=[[0,1],[3,5],[2,4, 12],...]

and original dataframe(in my code I call it 'x') is as : return time series of s&p 500 companies

I want to choose column [0,1] of the original dataframe and compute the mean (by row) of them and store it in a new dataframe, then compute the mean of columns [3, 5], and add it to the new dataframe, and so on ...

mu=pd.DataFrame() 
for j in range(get_number_of_elements(asset_clusts)):
    mu=x.iloc[:,asset_clusts[j]].mean(axis=1)

but, it gives to me only a column and i checked, this one column is the mean of last cluster columns

in case of ambiguity, function of get_number_of_elements is:

def get_number_of_elements(clist):
    count = 0
    for element in clist:
        count += 1
    return count

Solution

  • I solved it and in case if it would be helpful for others, here is the final function:

    def clustered_series(x, org_asset_clust):
        """
        x:return data
        org_asset_clust: list of clusters
        ----> mean of each cluster returns by row
        """
        def get_number_of_elements(org_asset_clust):
            count = 0
            for element in org_asset_clust:
                count += 1
            return count
        mu=[]
        for j in range(get_number_of_elements(org_asset_clust)):
            mu.append(x.iloc[:,org_asset_clust[j]].mean(axis=1))
            cluster_mean=pd.concat(mu, axis=1)
            
        return cluster_mean