is calling tokenizer on a batch significantly faster than on calling it on each item in a batch? e.g.
encodings = tokenizer(sentences)
# vs
encodings = [tokenizer(x) for x in sentences]
i ended up just timing both in case it's interesting for someone else
%%timeit
for _ in range(10**4): tokenizer("Lorem ipsum dolor sit amet, consectetur adipiscing elit.")
785 ms ± 24.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%%timeit
tokenizer(["Lorem ipsum dolor sit amet, consectetur adipiscing elit."]*10**4)
266 ms ± 6.52 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)