I am trying to write a function to calculate a diff between two avro schemas and generate another schema.
schema_one = {
"type": "record",
"name": "schema_one",
"namespace": "test",
"fields": [
{
"name": "type",
"type": "string"
},
{
"name": "id",
"type": "string"
}
]
}
schema_two = {
"type": "record",
"name": "schema_two",
"namespace": "test",
"fields": [
{
"name": "type",
"type": "string"
}
]
}
To get elements field in schema_one not in schema_two
import org.apache.avro.Schema._
import org.apache.avro.{Schema, SchemaBuilder}
val diff: Set[Schema.Field] = schema_one.getFields.asScala.toSet.filterNot(schema_two.getFields.asScala.toSet)
So far, so good.
I want to build a new schema from diff and I expect it to be:
schema_three = {
"type": "record",
"name": "schema_three",
"namespace": "test",
"fields": [
{
"name": "id",
"type": "string"
}
]
}
I cant seem to find any method within Avro SchemaBuilder
to achieve this without having to explicitly provide named fields. i.e build Schema
given Schema.Field
s
For example:
SchemaBuilder.record("schema_three").namespace("test").fromFields(diff)
Is there a way to achieve this? Appreciate comments.
I was able to achieve this using the kite sdk "org.kitesdk" % "kite-data-core" % "1.1.0"
val schema_namespace = schema_one.getNamespace
val schema_name = schema_one.getName
val schemas = diff.map( f => {
SchemaBuilder
.record(schema_name)
.namespace(schema_namespace)
.fields()
.name(f.name())
.`type`(f.schema())
.noDefault()
.endRecord()
}
)
val schema_three = SchemaUtil.merge(schemas.asJava)