Search code examples
jsonbert-language-model

how to handle BERT "UNK" Token in the output prediction


I train a pre trained BERT model on my data.
I try to make a Json containing two list:
first: a list conclude prediction of model (desire value)
second: a list of true value

but the first list has many ['UNK'] token in it
some thing like this:
enter image description here
why this happen? and how can I solve it?

this UNK tag make the prediction result near to zero:( because the accuracy rate is base on exact match of true and desire and this UNKs make desire differ...

what can I do for it?


Solution

  • ultimately, I found the problem... the Version of Bert I have used was adapted to Persian language and I was not passed the Persian normalizing process completely:) after completing that phase and some debugging into Bert configuration, it solved:)