I'm currently tryng to split text especially where there is no space after the '.'(dot). The df is a csv file.
My current code (don't mind the spaces please) :
for i in df['blurb']:
try:
df.loc[i,'blurb'] = df.loc[i,'blurb'].replace('.A', '.\nA')
except:
pass
...
try:
df.loc[i,'blurb'] = df.loc[i,'blurb'].replace('.Z', '.\nZ')
except:
pass
and this for every letter from the alphabet, since i' looking to put a /n [nem line] after every such dots.
the result is the same as the original (it does not want to save over the original). If i create another column, [blurb2], it gives the same outcome as the original blurb column. I've already looked for a few hours on this site for answers , but nothing seems to work [no error messages though] ... This is driving me crazy...
Anyone have any tips ? Thanks a mill in advance !
Cheers
To insert a newline after a dot that has a non-whitespace after it you can use
df['blurb'] = df['blurb'].str.replace(r'\.(?=\S)', '\\g<0>\n')
Note here
\.
- matches a dot char that is followed with(?=\S)
- any char other than whitespace. Since it is a regex lookahead, its pattern is only checked for, but does not get consumed.The \g<0>
is the whole match value matched by the regex.