I am using Flink SQL API and I am a bit lost between all the 'schema' types : TableSchema
, Schema
(from org.apache.flink.table.descriptors.Schema
) and TypeInformation
.
A TableSchema
can be created from a TypeInformation
, a TypeInformation
can be created from a TableSchema
and a Schema
can be created from a TableSchema
But it looks like a Schema
cannot be converted back to TypeInformation
or TableSchema
(?)
Why is there 3 different type of objects to store the same kind of information ?
For example, let's say that I have a string Schema coming from an Avro schema file, and that I want to add a new field to it. To do so, the only solution I have found is :
String mySchemaRaw = ...;
TypeInformation<Row> typeInfo = AvroSchemaConverter.convertToTypeInfo(mySchemaRaw);
Schema newSchema = new Schema().schema(TableSchema.fromTypeInfo(typeInfo));
newSchema = newSchema.field("nexField",...);
// Need the newSchema as a TableSchema
Is this the normal way to use these objects ? (looks weird to me)
TypeInformation
and TableSchema
solve different things. TypeInformation
is physical information how to ship a record class (e.g. a row or a POJO) from one operator to the other.
TableSchema
describes the schema of a table independent of the underlying per-record type. It is similar to the schema part of a CREATE TABLE name (a INT, b BIGINT)
DDL statement. In SQL one also doesn't define a table like CREATE TABLE name ROW(a INT, B BIGINT)
. But it is true that schema and row type are related which is why converter methods are provided. The differences become bigger once concepts like PRIMARY KEY
etc. are introduced.
Schema
is the current way of specifying non-SQL concepts such as time attributes and field mappings.