How to store Bert embeddings in cassandra

I want to use Cassandra as feature store to store precomputed Bert embedding, Each row would consist of roughly 800 integers (ex. -0.18294132) Should I store all 800 in one large string column or 800 separate columns?

Simple read pattern, On read we would want to read every value in a row. Not sure which would be better for serialization speed.

Solution

Having everything as a separate column will be quite inefficient - each value will have its own metadata (writetime, for example) that will add significant overhead (at least 8 bytes per every value). Storing data as string will be also not very efficient, and will add the complexity on the application side.

I would suggest to store data as fronzen list of integers/longs or doubles/floats, depending on your requirements. Something like:

create table ks.bert(
  rowid int primary key,
  data frozen<list<int>>
);

In this case, the whole list will be effectively serialized as binary blob, occupying just one cell.