Search code examples
fasttext

FastText window size


I'm currently working on fastText unsuperived learning. I wanted clarify something of context window present in fastText documentation.

In the description of the fasttext library for python https://github.com/facebookresearch/fastText/tree/master/python for training a fastText model there are different arguments, one of the arguments is,

  • ws: size of the context window

My input file contains lines with 2 - 3 tokens.

Eg.,

  • Senior Database Administrator
  • Senior DotNet programmer
  • Network administrator
  • Head Programmer (Mainframe)

The default window size 5. Here, in the above example, I have lines with token count less than the window size. What will happen if the window size is bigger than the document length?


Solution

  • FastText (& related algorithms like word2vec) will simply use as much of the context window as is possible.

    For example, assume a window-size of 5 and the input tokens:

    ['Senior', 'Database', 'Administrator']
    

    When training with the 'center' word 'Senior', the algorithm would be ready to consult up-to-5 words in either direction.

    But, there are 0 words preceding 'Senior', and only 2 words succeeding 'Senior', so only those 2 following words will be considered as neighbors.

    (No 'plug values' will be used as if they were blank-neighbors, nor will any 'bleed-through' to beighboring texts occur.)

    Two other related notes to keep in mind:

    • These algorithms do need neighboring words for any training to occur, so any texts with just a single word are essentially no-ops. (If there happens to be a word that only ever appears alone, you might still see a vector for it at the end of training, but in the implementations with which I am familiar, that will just be a randomly-initialized starting vector, completely untrained by real usage examples.)
    • Most implementations will simulate a weighting-of-neighboring-words by not *always using exactly your declared window-size, but rather, for each pass over a specific target center word, choosing a random window-size, from 1 to your chosen window-size. In this way, immediate-neighbors are always part of training, while words further away are more-often skipped.