Search code examples
pythonfor-loopreplacetext-processing

Adding a new line (/n) to a textblurp under condition using python


I'm currently tryng to split text especially where there is no space after the '.'(dot). The df is a csv file.

My current code (don't mind the spaces please) :

for i in df['blurb']:
  try:
    df.loc[i,'blurb'] = df.loc[i,'blurb'].replace('.A', '.\nA')
  except:
    pass 
...
  try:
    df.loc[i,'blurb'] = df.loc[i,'blurb'].replace('.Z', '.\nZ')
  except:
   pass

and this for every letter from the alphabet, since i' looking to put a /n [nem line] after every such dots.

the result is the same as the original (it does not want to save over the original). If i create another column, [blurb2], it gives the same outcome as the original blurb column. I've already looked for a few hours on this site for answers , but nothing seems to work [no error messages though] ... This is driving me crazy...

Anyone have any tips ? Thanks a mill in advance !

Cheers


Solution

  • To insert a newline after a dot that has a non-whitespace after it you can use

    df['blurb'] = df['blurb'].str.replace(r'\.(?=\S)', '\\g<0>\n')
    

    Note here

    • \. - matches a dot char that is followed with
    • (?=\S) - any char other than whitespace. Since it is a regex lookahead, its pattern is only checked for, but does not get consumed.

    The \g<0> is the whole match value matched by the regex.