I have a XLS/CSV file which I'm reading into pandas dataframe. I want to generate an avro schema out of this dataframe.
I'm new to python as well as pandas. Kindly help.
data_frame = pd.read_excel(INPUT_PATH)
I want to generate an avro schema from this data frame on the fly. Please help
I found the solution to it. I extracted the datatypes of the field in the pandas dataframe and saved it against the field name.
Mapped the data types to avro compatible data types ('object' in pandas -> 'string' in avro)
Created a template of an avro schema and put the substituted the field names and data types inside the 'fields :[]' part and posted it to the registry.
for instance :
schema = {"type": "record",
"name": schemaName,
"fields": [
{"name": key, "type": value} for (key, value) in myDict.items()
]
}
Fastavro library can then be used to parse this schema