From parquet 1.10.0, parquet introduces two new index structures, i.e., ColumnIndex and OffsetIndex. The document is here https://github.com/apache/parquet-format/blob/master/PageIndex.md
From the document, I can clearly understand the idea of ColumnIndex which points to pages inside each column chunk. But I don't quite understand the idea behind OffsetIndex.
As the document says, the OffsetIndex is used to navigate to rows identified by the ColumnIndex. But the ColumnIndex points only to pages which is compressed as a whole. Then, how can the OffsetIndex be used to navigate to, for example, a single row inside a row group?
After reading the doc here: https://docs.google.com/document/d/1sBACp8Lbutuj1Zxdowvsrlm8ku4BFxf8U_Do5K2wSO4/edit
In one sentence, one ColumnIndex stores the statistics of all pages belonging to one column, while the exact offset of each page within the ColumnChunk is stored in the OffsetIndex.