I have a pandas DataFrame
that contains two columns, one of tags containing numbers and the other with a list containing string elements.
Dataframe:
df = pd.DataFrame({
'tags': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1},
'elements': {
0: ['\n☒', '\nANNUAL REPORT PURSUANT TO SECTION 13 OR 15(d) OF THE SECURITIES EXCHANGE ACT OF 1934 '],
1: ['', ''],
2: ['\n', '\nFor the Fiscal Year Ended June 30, 2020'],
3: ['\n', '\n'],
4: ['\n', '\nOR']
}
})
I am trying to remove all instances of \n
from any element in all the lists from the column elements
but I'm really struggling to do so. My solution was to use a nested loop and re.sub()
to trying and replace these but it has done nothing (granted this is a horrible solution). This was my attempt:
for ls in range(len(page_table.elements)):
for st in range(len(page_table.elements[i])):
page_table.elements[i][st] = re.sub('\n', '', page_table.elements[i][st])
Is there a way to do this?
You can explode
and then replace
the \n
values.
You can leave out the .groupby(level=0).agg(list)
to not put them back into lists, though this will have a different shape to the original DataFrame.
df["elements"] = (
df["elements"]
.explode()
.str.replace(r"\n", "", regex=True)
.groupby(level=0)
.agg(list)
)
Which outputs:
0 [☒, ANNUAL REPORT PURSUANT TO SECTION 13 OR 15...
1 [, ]
2 [, For the Fiscal Year Ended June 30, 2020]
3 [, ]
4 [, OR]