Search code examples
goapache-arrow

Using the datatypes specified in datatype.go of golang apache arrow implementation for constructing a schema


I am learning apache Arrow and wanted to learn more about how to create a schema and an arrow record. For this I referenced some material but so far all of them just use the primitive types for building a schema like this:`

schema := arrow.NewSchema(
    []arrow.Field{
        {Name: "f1-i32", Type: arrow.PrimitiveTypes.Int32},
        {Name: "f2-f64", Type: arrow.PrimitiveTypes.Float64},
    },
    nil,
)

There are some datatypes not present in PrimitiveTypes that I want to work with. For example, I want to use bool or decimal128. I was looking through Golang arrow library and came across file datatype.go which has all possible datatypes that I want to use. But the type here is not of type DataType which is required when constructing the schema.

So, I have the following three questions:

  1. How can I use these datatypes from datatype.go, if possible, for constructing my schema?
  2. How can I specify a precision and scale if I want to use a decimal type?
  3. An example of using extension type.

Solution

  • These data type named constants defined in the datatype.go are used already for a part of making new types that you want. Some of them are type Decimal128Type struct and type BooleanType struct if you inspect source code of these structs' ID methods, they return the constant defined in the datatype.go whose name is similar to struct's name. And these structs have already implemented the DataType interface means you can assign them to the arrow.Field.Type because that field's type is DataType.
    With they I mean:
    The BOOL constant defined in the datatype.go is used as type BooleanType struct's ID method's return value in datatype_fixedwidth.go.
    func (t *BooleanType) ID() Type { return BOOL }
    Same thing valid for the type Decimal128Type struct too.
    func (*Decimal128Type) ID() Type { return DECIMAL128 }.

    Methods of one of these structs to show they are implement the DataType interface:

    func (*Decimal128Type) BitWidth() int
    func (t *Decimal128Type) Fingerprint() string
    func (*Decimal128Type) ID() Type
    func (*Decimal128Type) Name() string
    func (t *Decimal128Type) String() string
    

    Those methods are for type Decimal128Type struct.
    And definition of the DataType interface:

    type DataType interface {
        ID() Type
        // Name is name of the data type.
        Name() string
        Fingerprint() string
    }
    

    type BooleanType struct also implements it.

    Hence you can use them for the Type field of:

    type Field struct {
        Name     string   // Field name
        Type     DataType // The field's data type
        Nullable bool     // Fields can be nullable
        Metadata Metadata // The field's metadata, if any
    }
    

    A demonstrative example:

    package main
    
    import (
        "fmt"
    
        "github.com/apache/arrow/go/arrow"
    )
    
    func main() {
        booltype :=  &arrow.BooleanType{}
        decimal128type := &arrow.Decimal128Type{Precision: 1, Scale: 1}
    
        schema := arrow.NewSchema(
            []arrow.Field{
                {Name: "f1-bool", Type: booltype},
                {Name: "f2-decimal128", Type: decimal128type},
            },
            nil,
        )
    
        fmt.Println(schema)
    }
    

    Output:

    schema:
      fields: 2
        - f1-bool: type=bool
        - f2-decimal128: type=decimal(1, 1)
    

    You can find them in the documentation.
    There are also somethings which are related to the extension type.
    But I am not familiar with the extension type hence I could not show an example from it. But if you are familiar with it, you can solve it easily.