Search code examples
avrospark-avro

How to read/parse *only* the JSON schema from a file containing an avro message in binary format?


I have an avro message in binary format in a file.

Obj^A^D^Vavro.schemaÞ^B{"type":"record","name":"rec","namespace":"ns","fields":[{"name":"id","type":["int","null"]},{"name":"name","type":["string","null"]},{"name":"foo_id","type":["int","null"]}]}^Tavro.codec^Lsnappy^@¤²/n¹¼Bù<9b> à«_^NÌ^W

I'm just interested in the SCHEMA. Is there a way to read/parse just the schema from this file? I'm currently parsing this file by hand to extract the schema, but I was hoping avro would help me a standard way of doing that.


Solution

  • Avro does provide an API to get the schema from a file:

        File file = new File("myFile.avro")
    
        FileReader<?> reader = DataFileReader.openReader(file, new GenericDatumReader<>());
        Schema schema = reader.getSchema();
        System.out.println(schema);
    

    I think that it should match your definition of "just the schema", let me know if it doesn't.

    You could also use the getschema command from avro-tools if you have no reason to do it programmatically.