Search code examples
databasevector-databasemilvus

Representation of int8 and int16 scalar fields in schemapb


Looking at the Proto definitions in schema.proto, I see the following types are allowed for scalar fields:

message ScalarField {
  oneof data {
    BoolArray bool_data = 1;
    IntArray int_data = 2;
    LongArray long_data = 3;
    FloatArray float_data = 4;
    DoubleArray double_data = 5;
    StringArray string_data = 6;
    BytesArray bytes_data = 7;
    ArrayArray array_data = 8;
    JSONArray json_data = 9;
  }
}

Specifically, among the integer types, Int and Long are the only two available types. However, Milvus also supports Int8 and Int16 types, and looks like these are also represented using an IntArray.

How exactly does Milvus store int8 and int16 types in a []int32 slice? Is there any packing happening (i.e. four int8s stored in a single int32 index) or does each int8 and int16 occupy the full 4 bytes of space?


Solution

  • The data types of proto only support (u)int32 and (u)int64: https://protobuf.dev/programming-guides/proto3/#scalar

    So, int8/int16 values are transferred by int32 in RPC layer.

    In client application, you input int8 values, then the values are encoded in int32 values, and transferred by RPC channel. On the server side, the int32 values are decoded to int8 values and stored into storage.