Search code examples
apache-sparkavrospark-avro

How to write avro to multiple output directory using spark


Hi,There is a topic about writing text data into multiple output directories in one spark job using MultipleTextOutputFormat

Write to multiple outputs by key Spark - one Spark job

I would ask if there is some similar way to write avro data to multiple directories

What I want is to write the data in avro file to different directory(based on the timestamp field, same day in the timestamp goes to the same directory)


Solution

  • The AvroMultipleOutputs class simplifies writing Avro output data to multiple outputs.

    • Case one: writing to additional outputs other than the job default output. Each additional output, or named output, may be configured with its own Schema and OutputFormat.

    • Case two: to write data to different files provided by user

    AvroMultipleOutputs supports counters, by default they are disabled. The counters group is the AvroMultipleOutputs class name. The names of the counters are the same as the output name. These count the number of records written to each output name.

    Also have a look at