I am recently reading this paper: Word2Vec explained(https://arxiv.org/pdf/1402.3722.pdf)
And there's something I can't understand..
In page 3, they say that p is defined using softmax
$p(D=1|w, c, \theta) = \frac{1}{1+e^{-v_c\dotv_w}}$
but i am confused because i have seen that formula in sigmoid function, not softmax function.
How you derive that definition from softmax?
It can be called a small abuse of notation on the author's part but it's totally fine. Sometimes people use the softmax and sigmoid interchangeably. However, in this case it is indeed a sigmoid function because of binary class problem.