Search code examples
spark-avro

How to get the avro schema from StructType


I have a dataFrame

Dataset<Row> dataset = getSparkInstance().createDataFrame(newRDD, struct);

dataset.schema() is returning me a StructType.

But I want the actual schema to store in sample.avsc file

Basically I want to convert StructType to Avro Schema file (.avsc).

any Idea?


Solution

  • Below code is the work around that will solve my problem. Here I am saving the .avro file and getting the schema back from it.

    df.write().mode(SaveMode.Overwrite).format("com.databricks.spark.avro").save("outputPath");
    
                File files = new File("outputPath");
                String[] children = files.list();
    
                String filename="";
                for(String file : children) {
                    if (file.contains("SUCCESS")) {
    
                    }else {
                        filename=file;
                        if(file.contains(".crc")) {
                            filename= file.replaceAll(".crc", "");
                            if(filename.startsWith(".")) {
                                filename=filename.substring(1);
    
                            }
                            while(!new File("outputPath/"+filename).exists()) {
                                System.out.println("outputPath/"+filename);
                                Thread.sleep(100);
                            }
                        }                   
                    }
                }
                System.out.println(files.getAbsolutePath()+"/"+filename);
                DatumReader<GenericRecord> datumReader = new GenericDatumReader<>();
                DataFileReader<GenericRecord> dataFileReader = new DataFileReader<>(new File("outputPath/"+filename), datumReader);
                Schema schema = dataFileReader.getSchema();
                System.out.println(schema.toString());