I am using the keras tokenizer for my text preparation.
Now I have x values like 26.07.2020
or 27.September 1993
I want to use the tokenizer either for adding September
as a word to the index, but also 26, or 2020.
I used char_level=True before, but I think the model should perform better with having words like September
as word token. Is this possible with the keras tokenizer and if yes, how?
Thanks alot.
You can replace the .
with whitespaces, the Tokenizer
splits your sentence by whitespaces and then tokenize each word.
So a simple solution would be
x.replace('.', ' ')