SO!
Here is a function using itertoos.groupby
:
from string import whitespace, punctuation
from itertools import groupby
def tokenize(phrase, sepcat=True):
separators = dict.fromkeys(whitespace + punctuation, True)
return [''.join(g) for k, g in groupby(phrase, separators.get)]
Right now, the output is as follows:
As you can see, consecutive separators are concatenated into a single string. I would like this behavior to be optional (as denoted by the sepcat
parameter to my function), but this is where I hit a roadblock... How can I pass parameters to separators.get
?
Can something like functools
help me here?
use a lambda:
groupby(..., lambda x: my_normal_function(x, other, arguments))