Search code examples
datastage

DataStage and File Formats


Does anyone know how I can determine within a DataStage job if the input sequential file has EOL markers of MicroSoft or Unix such that it can direct the path through the rest of the job ? Thanks


Solution

  • If you just want to make sure you can read all the data regardless of the line endings, this might be useful to do it in one job instead of maintaining separate jobs for each DOS and UNIX files. So this answer does not match the exact question but might still be the answer.

    In the Sequential File Stage, you can define a filter sed 's/\r//' to convert DOS (Windows) line breaks (\r\n) to UNIX/Linux line breaks (\n).

    In the Format tab, define UNIX newline as Record delimiter.

    Note that when defining a filter in a sequential file, it has some drawbacks:

    • The Option First Line is Column Names is not available when using a filter.
      • If your input file has column names in the first line, you need to remove it manually by extending the filter with a tail command: sed 's/\r//' | tail -n +2
    • The Option Report Progress is not available when using a filter.
    • You might want to set the type of the last column in your column definition as VarChar or NVarChar, I've seen strange behaviour defining fixed length types like NChar or Char as last column's type. Though I didn't do deeper research on this, but I believe it has to do with using a filter.

    Tested in DS 11.7.1