In parquet files, data is stored in a small number of primitive types. There is, however, the concept of higher-order logical types (aka converted types). For example, a DECIMAL(10,2) may be stored as a byte array of length 3, i.e., an integer where the division by 100 to fixed-precision decimal is defined by the schema.
My question is this: where is there a map from numerical logical type to identifiers such as DECIMAL, and how are they further specified?
As far as I understand, the schema thrift spec block looks like this:
thrift_spec = (0, type(I32), type_length(I32),
repetition_type(I32), name(string),
num_children(I32), converted_type(I32), ...
)
It is the meaning of the last variable I am after, and what further information may follow in the spec.
A brief description is given here, so I was right about DECIMALs. How exactly the other are used remains somewhat opaque.
https://github.com/Parquet/parquet-format/blob/master/src/thrift/parquet.thrift#L65
Specifically, the scale to multiply by is 10**b where b is the next 32-bit integer in the spec block.