Search code examples
pythonpandaslistseries

How do I join the following list of series based on series name?


I have two object series, one before and one after, grouped by name.

I generated them from the following code:

beforeseries = dfbefore.groupby('name', dropna=True)['order'].apply(list)
print(beforeseries)
afterseries = dfafter.groupby('name', dropna=True)['order'].apply(list)
print(afterseries)

Which generate the following outputs:

Beforeseries:

Name1 [first, second, third] Name2 [first, second, third] Name_n [first, second, third, fourth]

Afterseries:

Name1 [fourth, fifth] Name2 [fourth, fifth, sixth] Name_n [fifth, sixth]

I want to join them so the output looks like the following”

Name 1 [‘first second third’, ‘fourth fifth’] Name 2 [‘first second third’, ‘fourth fifth sixth’] Name 3 [‘first second third fourth’, ‘fifth sixth’]


Solution

  • IIUC, seems like you need to concat both objects :

    out = (
        pd.concat([beforeseries, afterseries], axis=1)
            .groupby(level=0, axis=1).agg(lambda x: x.to_numpy().tolist())
            # .squeeze() # uncomment this chain if you need a Series
    )
    

    Another variant :

    # to turn off the FutureWarning in `2.1.0`:
    # DataFrame.groupby with axis=1 is deprecated.
    # Do `frame.T.groupby(...)` without axis instead.
    
    out = (
        pd.concat([beforeseries, afterseries], axis=1)
            .T.groupby(level=0).agg(list).T
    )
    

    Alternatively, without creating the intermediate Series :

    keys = ["before", "after"]
    
    out = (
        pd.concat([dfbefore, dfafter], keys=keys, names=["when", None])
            .groupby(by=["name", "when"], sort=False)["order"].agg(" ".join)
            .unstack().agg(list, axis=1)
            # .to_frame("order")
    )
    

    Output :

    print(out)
    
               order
    name            
    A     [v x, o q]
    B         [y, r]
    C     [w z, p s]
    

    Used inputs :

    dfbefore = pd.DataFrame({"name": list("ACABC"), "order": list("vwxyz")})
    beforeseries = dfbefore.groupby("name", dropna=True)["order"].agg(" ".join)
    
    dfafter = pd.DataFrame({"name": list("ACABC"), "order": list("opqrs")})
    afterseries  = dfafter.groupby("name", dropna=True)["order"].agg(" ".join)