In this table, all columns are strings. If there is a string == "nan" in column Title, I would like to concatenate all the rows with nan string with the last values in Title != "nan." For example
From this:
Title | Content | Measure |
---|---|---|
Background | text1 | Measure 1 |
Method | abc | Measure 1 |
nan | dfg | Measure 1 |
Background | text2 | Measure 2 |
Method | abcdfg | Measure 2 |
Background | text3 | Measure 3 |
Method | ab | Measure 3 |
nan | cd | Measure 3 |
nan | fg | Measure 3 |
# code to create original table
data = {'Title': ['Background', 'Method', 'nan', 'Background', 'Method', 'Background', 'Method', 'nan', 'nan'], 'Content': ['text1', 'abc', 'dfg', 'text2', 'abcdfg', 'text3', 'ab', 'cd', 'fg'], 'Measure': ['Measure1', 'Measure1', 'Measure1', 'Measure2', 'Measure2', 'Measure3', 'Measure3', 'Measure3', 'Measure3',]}
df = pd.DataFrame(data)
df
To this:
Title | Content | Measure |
---|---|---|
Background | text1 | Measure 1 |
Method | abcdfg | Measure 1 |
Background | text2 | Measure 2 |
Method | abcdfg | Measure 2 |
Background | text3 | Measure 3 |
Method | abcdfg | Measure 3 |
I need a function because my real data set has multiple rows that need to be fixed.
you can use:
df['last_nan_value']=df['Title'].fillna(method='ffill') #fill nans with last nan value in new column
df['id'] = ((df.last_nan_value != df.last_nan_value.shift())).cumsum() #assing an id for each title until title change
print(df)
'''
Title Content Measure last_nan_value id
0 Background text1 Measure 1 Background 1
1 Method abc Measure 1 Method 2
2 nan dfg Measure 1 Method 2
3 Background text2 Measure 2 Background 3
4 Method abcdfg Measure 2 Method 4
5 Background text3 Measure 3 Background 5
6 Method ab Measure 3 Method 6
7 nan cd Measure 3 Method 6
8 nan fg Measure 3 Method 6
'''
#group by id and use agg functions
dfx=df.groupby('id').agg({'Title':'first','Content':'sum','Measure':'first'})
dfx
id Title Content Measure
1 Background text1 Measure 1
2 Method abcdfg Measure 1
3 Background text2 Measure 2
4 Method abcdfg Measure 2
5 Background text3 Measure 3
6 Method abcdfg Measure 3