Let's say I want to train BERT with 2 sentences (query-answer) pair against a certain binary label (1,0) for the correctness of the answer, will BERT let me use 512 words/tokens each for the query and the answer or together(query+answer combined) they should be 512? [510 upon ignoring the [start] and [sep] token]
Thanks in advance!
Together, and actually it's together they should be 509 since there are two [SEP], one after question and another after answer:
[CLS] q_word1 q_word2 ... [SEP] a_word1 a_word2 ... [SEP]
where q_word
refers to words in the question and a_word
refers to words in the answer