Search code examples
pandaspandas-groupbypandas-apply

Pandas simple groupby and apply complains "Columns must be same length as key"


Essentially I have a table of timestamps and some data and want to group by the same timestamps and change the timestamps on a grouping basis. I got something working with Interpolate seconds to milliseconds in dataset?

The solution seems to work fine for many rows but not for simple datasets and I can't figure out why. I've narrowed it down to a simple example below.

Data:

    t  val
    0  0.3
    0  0.2
    0  0.6
    0  0.4

Expected result:

    t  val
    1  0.3
    1  0.2
    1  0.6
    1  0.4

Code:

df = pd.DataFrame([[0, 0.3], [0, 0.2], [0, 0.6], [0, 0.4]], columns=["t", "val"])

# Group by timestamp and add +1 to each (just for demonstration)
df.t = df.groupby("t", group_keys=False).apply(lambda df: df.t + 1)

This raises ValueError: Columns must be same length as key and I can't see what I'm doing wrong. Any help appreciated.


Solution

  • If need output values to new column use GroupBy.transform with specify column after groupby for processing:

    df.t = df.groupby('t')['t'].transform(lambda x: x + 1)
    

    Linked solution with np.linspace should be changed:

    df.t = df.groupby('t')['t'].transform(lambda x: x + np.linspace(0, 1, len(x)))
    print (df)
              t  val
    0  0.000000  0.3
    1  0.333333  0.2
    2  0.666667  0.6
    3  1.000000  0.4 
    

    Or add counter by GroupBy.cumcount:

    df.t += df.groupby('t').cumcount()