Search code examples
goparquet

Converting parquet file to Golang struct with nested elements


I am trying to read a parquet file with nested arrays/structs in Go using xitongsys/parquet-go library. The list data is not getting read and not seeing the values. Below is my struct in Golang

type Play struct {
    SID            string   `parquet:"name=si, type=BYTE_ARRAY, convertedtype=UTF8, encoding=PLAIN_DICTIONARY, repetitiontype=OPTIONAL" json:"si,omitempty"`
    TimeStamp      int      `parquet:"name=ts, type=INT64, repetitiontype=OPTIONAL" json:"ts,omitempty"`
    SingleID       int      `parquet:"name=sg, type=INT64, repetitiontype=OPTIONAL" json:"sg,omitempty"`
    PID            int      `parquet:"name=playid, type=INT64, repetitiontype=OPTIONAL" json:"playid,omitempty"`
    StartTimeStamp string   `parquet:"name=startts, type=BYTE_ARRAY,repetitiontype=OPTIONAL"`
    Price          []Price1 `parquet:"name=price, type=LIST, repetitiontype=REQUIRED" json:"price,omitempty"`
}

type Price1 struct {
    CurrID int    `parquet:"name=currId, type=INT64, repetitiontype=REQUIRED" json:"currId,omitempty"`
    LPTag  string `parquet:"name=lptag, type=BYTE_ARRAY,convertedtype=UTF8, repetitiontype=REQUIRED" json:"lptag,omitempty"`
    LPrice Money  `parquet:"name=lpmoney, type=STRUCT" json:"lpmoney,omitempty"`
}

type Money struct {
    AdmCurrCode  string `parquet:"name=admCC, type=BYTE_ARRAY, repetitiontype=OPTIONAL" json:"admCC,omitempty"`
    AdmCurrValue string `parquet:"name=admCV, type=BYTE_ARRAY" json:"admCV,omitempty"`
}

CurrID and LPTag are coming as empty even though the parquet file is having valid values


Solution

  • I found that the github.com/segmentio/parquet-go package can read the file correctly. Do you need to stick to the github.com/xitongsys/parquet-go package?

    package main
    
    import (
        "fmt"
    
        "github.com/segmentio/parquet-go"
    )
    
    type Play struct {
        SID            string  `parquet:"si"`
        TimeStamp      int     `parquet:"ts"`
        SingleID       int     `parquet:"sg"`
        PID            int     `parquet:"playid"`
        StartTimeStamp string  `parquet:"startts"`
        Price          []Price `parquet:"price,list"`
    }
    
    type Price struct {
        CurrID int    `parquet:"currId"`
        LPTag  string `parquet:"lptag"`
        LPrice Money  `parquet:"lpmoney"`
    }
    
    type Money struct {
        AdmCurrCode  string `parquet:"admCC"`
        AdmCurrValue string `parquet:"admCV"`
    }
    
    func main() {
        rows, err := parquet.ReadFile[Play]("s3.parquet")
        if err != nil {
            panic(err)
        }
    
        for _, c := range rows {
            fmt.Printf("%+v\n", c)
        }
    }