Search code examples
jsonrubyenumsavro

How to resolve The datum is not an example of schema (Avro::IO::AvroTypeError)


I am a newbie to Avro with Ruby and basically to programming.

While i was performing some basic stuff on Avro with ruby, I see some issues with the schema. Below is the code.

require 'rubygems'
require 'avro'  
require 'mysql2' 
require 'json' 
require 'multi_json'

# setup mysql
db = Mysql2::Client.new(:host => "localhost", :username =>"root",:password=> "root", :database => 'world')

file = File.open('C:/Avro-Spark-Inputs/Serialized_Avro/country_join.avro', 'wb')  

schema = Avro::Schema.parse(File.open("C:/Avro-Spark-Inputs/Schema/country.avsc", "rb").read)

writer = Avro::IO::DatumWriter.new(schema) 

dw = Avro::DataFile::Writer.new(file, writer, schema) 

results = db.query("SELECT * FROM country")

results.each do |row| 
dw << row
end

# close the avro data file
dw1.close   

puts "Avro File Created Succesfully"

Below is the defined schema.

   {
 "type" : "record",
 "namespace" : "country.avro",
 "name" : "country",
 "fields" : [
          {"name": "Code", "type": "string"},
          {"name": "Name", "type": "string"},
          {"name": "Continent", "type": {"name": "Continent", "type":"enum", "symbols": ["Asia", "Europe", "North America", "Africa", "Oceania", "Antarctica", "South America"]}},
          {"name": "Region", "type": "string"},
          {"name": "SurfaceArea", "type": "float"},
          {"name": "IndepYear", "type": "int"},
          {"name": "Population", "type": "int"},
          {"name": "LifeExpectancy", "type": "float"},
          {"name": "GNP", "type": "float"},
          {"name": "GNPOld", "type": "float"},
          {"name": "LocalName", "type": "string"},
          {"name": "GovernmentForm", "type": "string"},
          {"name": "HeadOfState", "type": "string"},
          {"name": "Capital", "type": "int"},
          {"name": "Code2", "type": "string"}
        ]
}

Error Observed:

C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:547:in `write_data': The datum {"Code"=>"ABW", "Name"=>"Aruba", "Continent"=>"North America", "Region"=>"Caribbean", "SurfaceArea"=>193.0, "IndepYear"=>nil, "Population"=>103000, "LifeExpectancy"=>78.4, "GNP"=>828.0, "GNPOld"=>793.0, "LocalName"=>"Aruba", "GovernmentForm"=>"Nonmetropolitan Territory of The Netherlands", "HeadOfState"=>"Beatrix", "Capital"=>129, "Code2"=>"AW"} is not an example of schema {"type":"record","name":"country","namespace":"country.avro","fields":[{"name":"Code","type":"string"},{"name":"Name","type":"string"},{"name":"Continent","type":{"type":"enum","name":"Continent","namespace":"country.avro","symbols":["Asia","Europe","North America","Africa","Oceania","Antarctica","South America"]}},{"name":"Region","type":"string"},{"name":"SurfaceArea","type":"float"},{"name":"IndepYear","type":"int"},{"name":"Population","type":"int"},{"name":"LifeExpectancy","type":"float"},{"name":"GNP","type":"float"},{"name":"GNPOld","type":"float"},{"name":"LocalName","type":"string"},{"name":"GovernmentForm","type":"string"},{"name":"HeadOfState","type":"string"},{"name":"Capital","type":"int"},{"name":"Code2","type":"string"}]} (Avro::IO::AvroTypeError)
    from C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:542:in `write'
    from C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/data_file.rb:136:in `<<'
    from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:26:in `block in <main>'
    from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:25:in `each'
    from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:25:in `<main>'

I am stuck here and couldn't find any answers online(Must be a silly question to ask).


Solution

  • I found the issue!

    The attribute "IndepYear" have nulls in database and I missed to mention the same in Avro Schema which encountered to the above error.