Search code examples
machine-learningdeep-learningword2vec

How to measure using word vectors


I'm attempting to understand how bias can be measured using word embeddings. Reading the article https://towardsdatascience.com/gender-bias-word-embeddings-76d9806a0e17

enter image description here

What is the bias being identified in the above statement ? Is the bias here that a woman cannot be seen as a doctor when a man is involved ?

Is a neutral bias for a either a man or woman being identified is where there is a small difference between woman,doctor man,doctor , represented a vector : $woman + doctor \approx man + doctor$ ?


Solution

  • You would expect that

    woman + doctor = man + doctor
    

    Or rewritten:

    woman + doctor - man = doctor
    

    But since it is 'nurse' in that word embedding space, that is an indicator for bias towards women in healthcare to be percieved as nurses. Doctors are associated more with men in the corpus from which the embeddings were trained, so it can be concluded that the corpus (and the learned word embedding) has a gender bias.