I've been given the challenge to find the seed from a series of pseudo-randomly generated alphanumerical IDs and after some analysis, I'm stuck in a dead end that I hope you'll be able to get me out of.
Each ID is obtained by passing the previous one through the encryption algorithm, that I'm supposed to reverse engineer in order to find the seed. The list given to me is composed of the 2070 first IDs (without the seed obviously). The IDs start as 4 alphanumerical characters, and switch to 5 after some time (e.g. "2xm1", "34nj", "avdfe", "2lgq9")
This switch happens once the algorithm, after encrypting an ID, returns an ID that has already been generated previously. At this point, it adds one character to this returned ID, making it longer and thus unique. It then proceeds as usual, generating IDs of the new length. This effectively means that the generation algorithm is surjective.
My first reflex was to try to convert those IDs from base36 to some other base, notably decimal. I used the results to scatter plot a chart of the IDs' decimal values in terms of their rank in the list, when I noticed a pattern that I couldn't understand the origin of.
After isolating the two parts of the list in terms of ID length, I scatter plotted the same graph for the 4-characters IDs sub-list and 5-characters IDs sub-list, allowing me to notice the strange density patterns.
After some analysis, I've observed 2 things :
I've tried to correlate such a behavior with other known PRNG scatter-plots, but none of them matched what I get on my graphs.
I'm hoping some of you might know about an encryption method, formula, or function matching such a specific scatter plot, or have any idea about what could be going on behind the scenes.
Thanks in advance for your answers.
This answer may not be very useful but I think it can help. the graph plot you shown is most likely that it doesn't belong to one of the most known PRNG used and of course it would never belong to cryptographic PRNG.
But I have a notice I dont know if it can help. This PRNG seems to have a full period equals to full cycle of numbers generated for a fixed character places. I mean it operate with a pattern for 4 digits then repeat pattern but with higher magnitude for 5 characters which will propably means that this same pattern of distribution will repeat for 6 characters but with higher magnitude.
So, in summery, this can mean that this pattern can be exploited if you know what is the value of this magnitude so you know the increments for 6 characters graph plot and then you can just stretch the 5 characters graph on the Y-Axis to get some kind of a solution (which would be the seed for 6 characters graph).
EDIT: To clear things more clearly regarding your comment. what I mean is that this PRNG generate random numbers but these random numbers would not be repeated to infinity instead there will be some point in time were the same sequence will be regenerated. The I've inadvertantly left behind a piece of information:
confirm this since when it encounter same number generated before ( reached this point in time where same sequence is regenerated ). It will just add 1 extra character to the sequence which would not change the distribution on the graph but instead will make the graph appear like if it was stretched along Y-Axis (like if Y intercept of the graph function just got bigger).