I am using Facebook's Fasttext for performing text classification. I wanted to know how fasttext library handle the numbers in a text string provided as input for word vectorization.
Do fasttext typecast each number as a string before creating word vectors?
For e.g. 1124 to " 1124 "
Or any other transformation/preprocessing is performed in the background before training?
For e.g. 1124 to " one one two four "
What should be the most optimal approach to handle numerical data if my input text to fasttext contains numbers?
Fasttext doesn't do any preprocessing of numeric tokens. They are treated like other whitespace-separated "words".
Unless you already have a specific problem with fasttext and numbers in your input, I wouldn't worry about what fasttext does with the numbers. Just use it as normal.
If you have a lot of numbers and they're causing problems - this is possible since fasttext likely doesn't have any useful vectors for most specific numbers - you can pre-process your input to replace them with <NUMBER>
or another dummy token. That way these sentences will be the same to fasttext:
Whether you want to treat those as the same or not depends on your application.