Search code examples
pandasdataframeaveragesegment

average on dataframe segments


In the following picture, I have DataFrame that renders zero after each cycle of operation (the cycle has random length). I want to calculate the average (or perform other operations) for each patch. For example, the average of [0.762, 0.766] alone, and [0.66, 1.37, 2.11, 2.29] alone and so forth till the end of the DataFrame.

enter image description here


Solution

  • So I worked with this data :

        random_value
    0   0
    1   0
    2   1
    3   2
    4   3
    5   0
    6   4
    7   4
    8   0
    9   1
    

    There is probably a way better solution, but here is what I came with :

    def avg_function(df):
        avg_list = []
        value_list = list(df["random_value"])
        temp_list = []
        for i in range(len(value_list)):
            if value_list[i] == 0:
                if temp_list:
                    avg_list.append(sum(temp_list) / len(temp_list))
                    temp_list = []
            else:
                temp_list.append(value_list[i])
        if temp_list:  # for the last values
            avg_list.append(sum(temp_list) / len(temp_list))
        return avg_list
    
    test_list = avg_function(df=df)
    test_list
    
    [Out] : [2.0, 4.0, 1.0]
    

    Edit: since requested in the comments, here is a way to add the means to the dataframe. I dont know if there is a way to do that with pandas (and there might be!), but I came up with this :

    def add_mean(df, mean_list):
        temp_mean_list = []
        list_index = 0  # will be the index for the value of mean_list
    
        df["random_value_shifted"] = df["random_value"].shift(1).fillna(0)
        random_value = list(df["random_value"])
        random_value_shifted = list(df["random_value_shifted"])
       
    
        for i in range(df.shape[0]):
            if random_value[i] == 0 and random_value_shifted[i] == 0:
                temp_mean_list.append(0)
            elif random_value[i] == 0 and random_value_shifted[i] != 0:
                temp_mean_list.append(0)
                list_index += 1
            else:
                temp_mean_list.append(mean_list[list_index])
        df = df.drop(["random_value_shifted"], axis=1)
        df["mean"] = temp_mean_list
        return df
    
    df = add_mean(df=df, mean_list=mean_list
    

    Which gave me :

    df
    
    [Out] :
        random_value    mean
    0   0               0
    1   0               0
    2   1               2
    3   2               2
    4   3               2
    5   0               0
    6   4               4
    7   4               4
    8   0               0
    9   1               1