The aim of my task is to add spaces before and after punctuation. Currently i've been using an iterative str.replace()
to replace each punctuation p
with " "+p+" "
. How do i achieve the same output with str.translate()
where i can just pass in two list or a dictionary:
inlist = string.punctuation
outlist = [" "+p+" " for p in string.punctuation]
inoutdict = {p:" "+p+" " for p in string.punctuation}
Lets assume that all the punctuations i have are in string.punctuation
. Currently, i'm doing it as such:
from string import punctuation as punct
def punct_tokenize(text):
for ch in text:
if ch in deupunct:
text = text.replace(ch, " "+ch+" ")
return " ".join(text.split())
sent = "This's a foo-bar sentences with many, many punctuation."
print punct_tokenize(sent)
Also this iterative str.replace()
is taking too long, will str.translate()
be any faster?
The dict form of translate only works with unicodes:
>>> import string
>>> inoutdict = {ord(p):unicode(" "+p+" ") for p in string.punctuation}
>>> unicode("foo,,,bar!!1").translate(inoutdict)
u'foo , , , bar ! ! 1'
Another option is with regular expressions:
>>> import re
>>> rx = '[%s]' % re.escape(string.punctuation)
>>> re.sub(rx, r" \g<0> ", "foo,,,bar!!1")
'foo , , , bar ! ! 1'
As usual, show us a bigger picture to get better answers, e.g. why are you doing that? where does the input come from?, etc...