Hi have my own corpus and I train several Word2Vec models on it. What is the best way to evaluate them one against each-other and choose the best one? (Not manually obviously - I am looking for various measures).
It worth noting that the embedding is for items and not word, therefore I can't use any existing benchmarks.
Thanks!
There's no generic way to assess token-vector quality, if you're not even using real words against which other tasks (like the popular analogy-solving) can be tried.
If you have a custom ultimate task, you have to devise your own repeatable scoring method. That will likely either be some subset of your actual final task, or well-correlated with that ultimate task. Essentially, whatever ad-hoc method you may be using the 'eyeball' the results for sanity should be systematized, saving your judgements from each evaluation, so that they can be run repeatedly against iterative model improvements.
(I'd need more info about your data/items and ultimate goals to make further suggestions.)