Search code examples
datastage

Parallel job is adding extra columns when outputting to a dataset


The last job before my dataset is written is a transformation. It's a lot more complex than this, but the basics are:

  • input = A Integer, B Integer and C Integer
  • output = A Integer, if B > 10 then C else 0 -> C Integer

So, to clarify, column A is just passed through and columns B and C are used to perform a transformation that is called "C" in the final output link.

When I examine the columns being written to the dataset I see A and C. I can save the table definition and this is also just columns A and C. However, when I actually run the job, column B also ends up in the dataset, so I end up with (in any order) columns A, B and C.

I've tried deleting my output dataset, then recreating it, giving it a new name, but it always ends up with that "working column" B in it for some reason I don't fully understand. I don't see how it's picking up a column that isn't in the final output link and choosing to add it against my wishes.

I don't want column B in my dataset, it's wasteful to store it and it's confusing for developers as it shouldn't be there in the first place. How do I stop DataStage from writing it?


Solution

  • Seems you have RCP Runtime Column Propagation activated - that will transfor all columns available independend of the specified ones.

    Go to the stage (Transformer) - Properties - Output tab and there is a checkbox Runtime Column Propagation - remove the check mark. In other stages it could be located on the columns tab as well. In the job properties of your job there is also a setting which will enable RCP for new links - remove this mark as well to avaoid this problems for future job extensions.

    For more details on RCP check out this.