Search code examples
c#datastage

Surrogate Key Generator state file, is there a way to read file from another prog language?


Good day. I am trying to find a way to read the surrogate key state file to get what is its current value and how to change it. The problem being that the database is being constantly refreshed and I am needing a mechanism where I can get the max value from the table and then set the surrogate key state file.

From what I have been reading its not like the dataset (.ds) files where you can use the DataStage Designer tool to read it. I tried making a small C# application where it would read it as a binary file. Various articles explain that it is an unsigned 64 bit integer. Still when I try to read it, it gives a random set of numbers. It starts with one, then numbers ending in 999, and then it repeats. I tried reading it with the bit converter class but no luck either.

So far the only solution I have seen is to create a parallel or sequential job that gets the max number from the database and then creates the surrogate key with it as explained in http://it.toolbox.com/blogs/infosphere/datastage-8-tutorial-surrogate-key-state-files-17403.

I am not the first one to try changing it through code and was curious if there was some way to do it.

using DataStage 8.7 Tried with C# BinaryReader.ReadUInt64, BinaryReader.ReadInt64 and BitConverter.ConvertToUInt64

Update 2016-10-19: The partial answer is that it can be read as a binary file. It is divided in 4 sets of 8 bytes. Something like this (you can see it with a hex editor.

01 00 00 00 00 00 00 00
00 00 00 00 00 00 00 05
00 00 00 00 00 00 00 08
00 00 00 00 00 00 00 08

The first set I think is the incremental number (+1, +5, etc)

Second Set is the initial value

Third set is the next number to assign

Fourth set I think is the end of the batch to assign. If you are doing 10 by 10 batches then third is 10 and fourth is 20 or that is how I think it works.

So for reading it by code you need to read it with a binary reader and get sets of 8 bytes to convert to UINT64.

The question still stands because I am not sure what they mean.


Solution

  • The reason for all this is that I am looking for a bug in the Surrogate Key State File. So when you go to the complex transformer (sorry there are no pictures), you will go to the properties and see it has a Surrogate Key Tab. You have three settings. One is for the file, one is for the initial value and another is for the block size.

    The file is where current surrogate key is stored. I will explain soon how it is formatted. The Initial value is from what number you wish to start in, and the block size is to reserve a group of numbers for your transformer.

    The file is formatted in 16 byte increments, first one is the current number. The number to assign is this number + 1, and the second is the end of the block size. It will only be 16 bytes when you do not define the initial value or the block size. if you define these it will be 32 bytes. Where the last two values are the current number and the block end.

    So when you have two transformers or more using a the same file. It will assign a block that has numbers available before getting a new block and increasing the file size by 16 more bytes if its needed.

    So what was the error, when you do not define the block size but define an initial value, the system block size will be around 1000 or so. Lets say you do a small example where all you have is a Row Generator connected to a transformer that ends in a sequential file. All you need is one row. Execute it many times, and lets say your initial value is 200. It will be 200,201,203,204,(1),205. For some reason it bugs in DataStage 8.7 and when you do not define the block size it returns back to one.

    I hope this research on the subject will help someone because I looked and looked and there was not much on how to best use Surrogate Keys.

    If you wish for the error to happen faster just delete the file and create a new one with C#, assign 4 UINT64 values saved as BYTES. first two values 1,1,200,300. Eventually it will do what I described.