Search code examples

Pandas aggregation function: Merge text rows, but insert spaces between them?

I managed to group rows in a dataframe, given one column (id). The problem is that one column consists of parts of sentences, and when I add them together, the spaces are missing.

An example probably makes it easier to understand...

My dataframe looks something like this:

import pandas as pd

#create dataFrame
df = pd.DataFrame({'id': [101, 101, 102, 102, 102],
                   'text': ['The government changed', 'the legislation on import control.', 'Politics cannot solve all problems', 'but it should try to do its part.', 'That is the reason why these elections are important.'],
                   'date': [1990, 1990, 2005, 2005, 2005],})
    id                                               text  date
0  101                             The government changed  1990
1  101                 the legislation on import control.  1990
2  102                 Politics cannot solve all problems  2005
3  102                  but it should try to do its part.  2005
4  102  That is the reason why these elections are imp...  2005

Then I used the aggregation function:

aggregation_functions = {'id': 'first','text': 'sum', 'date': 'first'}
df_new = df.groupby(df['id']).aggregate(aggregation_functions)

which returns:

    id    text                                                        date
0  101    The government changedthe legislation on import control.    1990
2  102    Politics cannot solve all problemsbut it should try to...   2005

So, for example I need a space in between ' The government changed' and 'the legislation...'. Is that possible?


  • If you need to put a space between the two phrases/rows, use str.join :

    ujoin = lambda s: " ".join(dict.fromkeys(s.astype(str)))
    out= df.groupby(["id", "date"], as_index=False).agg(**{"text": ("text", ujoin)})[df.columns]

    # Output :

        id                                                                                                                        text  date
    0  101                                                                   The government changed the legislation on import control.  1990
    1  102  Politics cannot solve all problems but it should try to do its part. That is the reason why these elections are important.  2005