in my data cleaning process i found some strings with inhbit a single char that might bias my analysis
i.e. 'hello please help r me with this s question'.
Until now i only found tools to remove specific chars , like
char= 's'
def char_remover(text:
spec_char = ''.join (i for i in text if i not in s text)
return spec_char
or the rsplit(), split() functions, which are good for deleting first /last char of a string.
In the end, I want to code a function that removes all single chars (whitespace char whitespace) from my string/dataframe.
My own thoughts on that question:
def spec_char_remover(text):
spec_char_rem= ''.join(i for i in text if i not len(i) <= 1)
return spec_char_rem
But that obviously didn´t work.
Thanks in advance.
You could use regex:
>>> import re
>>> s = 'hello please help r me with this s question'
>>> re.sub(' . ', ' ', s)
'hello please help me with this question'
".
" in regex matches any character. So " .
" matches any character surrounded by spaces. You could also use "\s.\s
" to match any character surrounded by any whitespace.