I have a dataFrame
Dataset<Row> dataset = getSparkInstance().createDataFrame(newRDD, struct);
dataset.schema()
is returning me a StructType.
But I want the actual schema to store in sample.avsc
file
Basically I want to convert StructType to Avro Schema file (.avsc).
any Idea?
Below code is the work around that will solve my problem. Here I am saving the .avro file and getting the schema back from it.
df.write().mode(SaveMode.Overwrite).format("com.databricks.spark.avro").save("outputPath");
File files = new File("outputPath");
String[] children = files.list();
String filename="";
for(String file : children) {
if (file.contains("SUCCESS")) {
}else {
filename=file;
if(file.contains(".crc")) {
filename= file.replaceAll(".crc", "");
if(filename.startsWith(".")) {
filename=filename.substring(1);
}
while(!new File("outputPath/"+filename).exists()) {
System.out.println("outputPath/"+filename);
Thread.sleep(100);
}
}
}
}
System.out.println(files.getAbsolutePath()+"/"+filename);
DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("outputPath/"+filename), datumReader);
Schema schema = dataFileReader.getSchema();
System.out.println(schema.toString());