I am a newbie to Avro with Ruby and basically to programming.
While i was performing some basic stuff on Avro with ruby, I see some issues with the schema. Below is the code.
require 'rubygems'
require 'avro'
require 'mysql2'
require 'json'
require 'multi_json'
# setup mysql
db = Mysql2::Client.new(:host => "localhost", :username =>"root",:password=> "root", :database => 'world')
file = File.open('C:/Avro-Spark-Inputs/Serialized_Avro/country_join.avro', 'wb')
schema = Avro::Schema.parse(File.open("C:/Avro-Spark-Inputs/Schema/country.avsc", "rb").read)
writer = Avro::IO::DatumWriter.new(schema)
dw = Avro::DataFile::Writer.new(file, writer, schema)
results = db.query("SELECT * FROM country")
results.each do |row|
dw << row
end
# close the avro data file
dw1.close
puts "Avro File Created Succesfully"
Below is the defined schema.
{
"type" : "record",
"namespace" : "country.avro",
"name" : "country",
"fields" : [
{"name": "Code", "type": "string"},
{"name": "Name", "type": "string"},
{"name": "Continent", "type": {"name": "Continent", "type":"enum", "symbols": ["Asia", "Europe", "North America", "Africa", "Oceania", "Antarctica", "South America"]}},
{"name": "Region", "type": "string"},
{"name": "SurfaceArea", "type": "float"},
{"name": "IndepYear", "type": "int"},
{"name": "Population", "type": "int"},
{"name": "LifeExpectancy", "type": "float"},
{"name": "GNP", "type": "float"},
{"name": "GNPOld", "type": "float"},
{"name": "LocalName", "type": "string"},
{"name": "GovernmentForm", "type": "string"},
{"name": "HeadOfState", "type": "string"},
{"name": "Capital", "type": "int"},
{"name": "Code2", "type": "string"}
]
}
Error Observed:
C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:547:in `write_data': The datum {"Code"=>"ABW", "Name"=>"Aruba", "Continent"=>"North America", "Region"=>"Caribbean", "SurfaceArea"=>193.0, "IndepYear"=>nil, "Population"=>103000, "LifeExpectancy"=>78.4, "GNP"=>828.0, "GNPOld"=>793.0, "LocalName"=>"Aruba", "GovernmentForm"=>"Nonmetropolitan Territory of The Netherlands", "HeadOfState"=>"Beatrix", "Capital"=>129, "Code2"=>"AW"} is not an example of schema {"type":"record","name":"country","namespace":"country.avro","fields":[{"name":"Code","type":"string"},{"name":"Name","type":"string"},{"name":"Continent","type":{"type":"enum","name":"Continent","namespace":"country.avro","symbols":["Asia","Europe","North America","Africa","Oceania","Antarctica","South America"]}},{"name":"Region","type":"string"},{"name":"SurfaceArea","type":"float"},{"name":"IndepYear","type":"int"},{"name":"Population","type":"int"},{"name":"LifeExpectancy","type":"float"},{"name":"GNP","type":"float"},{"name":"GNPOld","type":"float"},{"name":"LocalName","type":"string"},{"name":"GovernmentForm","type":"string"},{"name":"HeadOfState","type":"string"},{"name":"Capital","type":"int"},{"name":"Code2","type":"string"}]} (Avro::IO::AvroTypeError)
from C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/io.rb:542:in `write'
from C:/Ruby23-x64/lib/ruby/gems/2.3.0/gems/avro-1.8.2/lib/avro/data_file.rb:136:in `<<'
from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:26:in `block in <main>'
from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:25:in `each'
from C:/Avro-Spark-Inputs/Serialization/Country_Serialization.rb:25:in `<main>'
I am stuck here and couldn't find any answers online(Must be a silly question to ask).
I found the issue!
The attribute "IndepYear" have nulls in database and I missed to mention the same in Avro Schema which encountered to the above error.