Search code examples
machine-learningnlpword2vec

What is negative sampling method use- sigmoid or softmax?


I am recently reading this paper: Word2Vec explained(https://arxiv.org/pdf/1402.3722.pdf)

And there's something I can't understand..

In page 3, they say that p is defined using softmax

$p(D=1|w, c, \theta) = \frac{1}{1+e^{-v_c\dotv_w}}$

enter image description here

but i am confused because i have seen that formula in sigmoid function, not softmax function.

How you derive that definition from softmax?


Solution

  • It can be called a small abuse of notation on the author's part but it's totally fine. Sometimes people use the softmax and sigmoid interchangeably. However, in this case it is indeed a sigmoid function because of binary class problem.