Search code examples
pythonpandasgroup-by

ValueError: Must have equal len keys and value when setting with an iterable when setting value on group


I have a pandas dataframe with a timestamp index, I am grouping to get only hourly values and after a series of operations on the values of that hour I need to re-write the results to the original DF:

for name, group in df.groupby(pd.Grouper(freq="1H")):
    if group.shape[0] > 0:
        results = some_function(group) # Operations on the group, returns a list of labels same length of the group
        df.loc[group.index, 'results'] = results 

I am getting the error ValueError: Must have equal len keys and value when setting with an iterable but it only happens after many successful iterations (hours) in the for loop. Any ideas?


Solution

  • One possible problem should be duplicated index values, possible solution is avoid loops with assign in loc:

    df = pd.DataFrame({'value': [10, 20, 30, 40]}, 
                      index=pd.to_datetime([
                      "2021-01-01 00:00:00",
                      "2021-01-01 00:00:00",
                      "2021-01-01 00:30:00",
                      "2021-01-01 01:00:00"]))
    
    #custom function
    def some_function(x):
        return range(len(x))
    
    def helper(group):
        group['results'] = some_function(group) 
        return group
    
    out = df.groupby(pd.Grouper(freq="1h"), group_keys=False).apply(helper)
    print(out)
                         value  results
    2021-01-01 00:00:00     10        0
    2021-01-01 00:00:00     20        1
    2021-01-01 00:30:00     30        2
    2021-01-01 01:00:00     40        0
    

    Another problem should be different length between length of groups and array/list returned from your custom function, here si solution for found this problematic data:

    df = pd.DataFrame({'value': [10, 20, 30, 40]}, 
                      index=pd.to_datetime([
                      "2021-01-01 00:00:00",
                      "2021-01-01 00:00:00",
                      "2021-01-01 00:30:00",
                      "2021-01-01 01:00:00"]))
    
    #simulate different lengths of lists returned from function
    def some_function(x):
        if len(x) == 1:
            return range(len(x) * 2)
        else:
            return range(len(x))
    
    def helper(group):
        print (group)
        print (f'Length of group is {len(group)}')
        print (f'Length of output from function is {len(some_function(group))}')
        group['results'] = some_function(group) 
        return group
    
    out = df.groupby(pd.Grouper(freq="1h"), group_keys=False).apply(helper)
    # print(out)
    

                         value
    2021-01-01 00:00:00     10
    2021-01-01 00:00:00     20
    2021-01-01 00:30:00     30
    Length of group is 3
    Length of output from function is 3
                         value
    2021-01-01 01:00:00     40
    Length of group is 1
    Length of output from function is 2