I am learning apache Arrow and wanted to learn more about how to create a schema and an arrow record. For this I referenced some material but so far all of them just use the primitive types for building a schema like this:`
schema := arrow.NewSchema(
[]arrow.Field{
{Name: "f1-i32", Type: arrow.PrimitiveTypes.Int32},
{Name: "f2-f64", Type: arrow.PrimitiveTypes.Float64},
},
nil,
)
There are some datatypes not present in PrimitiveTypes that I want to work with. For example, I want to use bool or decimal128. I was looking through Golang arrow library and came across file datatype.go
which has all possible datatypes that I want to use.
But the type here is not of type DataType
which is required when constructing the schema.
So, I have the following three questions:
datatype.go
, if possible, for constructing my schema?These data type named constants defined in the datatype.go
are used already for a part of making new types that you want. Some of them are type Decimal128Type struct
and type BooleanType struct
if you inspect source code of these structs' ID
methods, they return the constant defined in the datatype.go
whose name is similar to struct's name. And these structs have already implemented the DataType
interface means you can assign them to the arrow.Field.Type
because that field's type is DataType
.
With they I mean:
The BOOL
constant defined in the datatype.go
is used as type BooleanType struct
's ID
method's return value in datatype_fixedwidth.go
.
func (t *BooleanType) ID() Type { return BOOL }
Same thing valid for the type Decimal128Type struct
too.
func (*Decimal128Type) ID() Type { return DECIMAL128 }
.
Methods of one of these structs to show they are implement the DataType
interface:
func (*Decimal128Type) BitWidth() int
func (t *Decimal128Type) Fingerprint() string
func (*Decimal128Type) ID() Type
func (*Decimal128Type) Name() string
func (t *Decimal128Type) String() string
Those methods are for type Decimal128Type struct
.
And definition of the DataType
interface:
type DataType interface {
ID() Type
// Name is name of the data type.
Name() string
Fingerprint() string
}
type BooleanType struct
also implements it.
Hence you can use them for the Type
field of:
type Field struct {
Name string // Field name
Type DataType // The field's data type
Nullable bool // Fields can be nullable
Metadata Metadata // The field's metadata, if any
}
A demonstrative example:
package main
import (
"fmt"
"github.com/apache/arrow/go/arrow"
)
func main() {
booltype := &arrow.BooleanType{}
decimal128type := &arrow.Decimal128Type{Precision: 1, Scale: 1}
schema := arrow.NewSchema(
[]arrow.Field{
{Name: "f1-bool", Type: booltype},
{Name: "f2-decimal128", Type: decimal128type},
},
nil,
)
fmt.Println(schema)
}
Output:
schema:
fields: 2
- f1-bool: type=bool
- f2-decimal128: type=decimal(1, 1)
You can find them in the documentation.
There are also somethings which are related to the extension type.
But I am not familiar with the extension type hence I could not show an example from it. But if you are familiar with it, you can solve it easily.