Search code examples
scalahadoopscalding

Scalding: Ouptut schema from pipe operation


I am reaidng files on HDFS via scalding, aggregating on some fields, and writing to a tab delimited file via TSV. How can I write out a file that contains the schema of my output file? For example,

UnpackedAvroSource(args("input"))
  .project('key, 'var1)
  .groupBy('key){_.sum[Long]('var1 -> var1sum))}
  .write(Tsv(args("output")))

I want to write an output text file that contains "Key, var1sum" that someone who picks up my ooutput file later knows what the columns. I'm assuming scalding doesn't embed this in the file somewhere?

Thanks.


Solution

  • Just found the option writeHeader = true which will write the column names to the output file, negating the need for writing out to a file.