python pandas string function concatenation

Concatenate rows based on a condition in Python

In this table, all columns are strings. If there is a string == "nan" in column Title, I would like to concatenate all the rows with nan string with the last values in Title != "nan." For example

From this:

Title	Content	Measure
Background	text1	Measure 1
Method	abc	Measure 1
nan	dfg	Measure 1
Background	text2	Measure 2
Method	abcdfg	Measure 2
Background	text3	Measure 3
Method	ab	Measure 3
nan	cd	Measure 3
nan	fg	Measure 3

# code to create original table
data = {'Title': ['Background', 'Method', 'nan', 'Background', 'Method', 'Background', 'Method', 'nan', 'nan'], 'Content': ['text1', 'abc', 'dfg', 'text2', 'abcdfg', 'text3', 'ab', 'cd', 'fg'], 'Measure': ['Measure1', 'Measure1', 'Measure1', 'Measure2', 'Measure2', 'Measure3', 'Measure3', 'Measure3', 'Measure3',]} 

df = pd.DataFrame(data)  

df

To this:

Title	Content	Measure
Background	text1	Measure 1
Method	abcdfg	Measure 1
Background	text2	Measure 2
Method	abcdfg	Measure 2
Background	text3	Measure 3
Method	abcdfg	Measure 3

I need a function because my real data set has multiple rows that need to be fixed.

Solution

you can use:

df['last_nan_value']=df['Title'].fillna(method='ffill') #fill nans with last nan value in new column

df['id'] = ((df.last_nan_value != df.last_nan_value.shift())).cumsum() #assing an id for each title until title change

print(df)
'''
    Title       Content Measure     last_nan_value  id
0   Background  text1   Measure 1   Background      1
1   Method      abc     Measure 1   Method          2
2   nan         dfg     Measure 1   Method          2
3   Background  text2   Measure 2   Background      3
4   Method      abcdfg  Measure 2   Method          4
5   Background  text3   Measure 3   Background      5
6   Method      ab      Measure 3   Method          6
7   nan         cd      Measure 3   Method          6
8   nan         fg      Measure 3   Method          6


'''

#group by id and use agg functions
dfx=df.groupby('id').agg({'Title':'first','Content':'sum','Measure':'first'})
dfx

id  Title       Content Measure
1   Background  text1   Measure 1
2   Method      abcdfg  Measure 1
3   Background  text2   Measure 2
4   Method      abcdfg  Measure 2
5   Background  text3   Measure 3
6   Method      abcdfg  Measure 3