I have a job that successfully produces a sequential file (CSV) output with some hundred million rows, can someone provide an example where the output is written to a hundred separate sequential files, each with a million rows?
What does the sequential file stage look like, how is it configured?
This is to ultimately allow QA to review any one of the individual outputs without a special text editor that can view large text files.
Based on the suggestion from @Mr. Llama and a lack of forthcoming solutions we decided on a simple script to be executed at the end of the scheduled DataStage event.
#!/bin/bash
# usage:
# sh ./[script] [input]
# check for input:
if [ ! $# == 1 ]; then
echo "No input file provided."
exit
fi
# directory for output:
mkdir split
# header without content:
head -n 1 $1 > header.csv
# content without header:
tail +2 $1 > content.csv
# split content into 100000 record files:
split -l 100000 content.csv split/data_
# loop through the new split files, adding the header
# and a '.csv' extension:
for f in split/*; do cat header.csv $f > $f.csv; rm $f; done;
# remove the temporary files:
rm header.csv
rm content.csv
Crude but works for us in this case.