Search code examples
pythonstringincrement

How to identify incremental patterns in a string in Python


I have a one-column data frame that contains randomly generated characters. I am hoping to write some code that can identify if any of the characters are following an incremental pattern of some sort. Example:

ebe120xg21
ebe121xg22
vpq17laos
fvut10hals
ebe122xg23

Some of this numbers are clearly incrementing e.g. 120 and 121 There's also 21,22 and 23.

How would I efficiently identify such kind of incrementation? The tricky part is that this patterns can appear on any section of the string.


Solution

  • Try this:

    df['nums']=df.yourcolumn.apply(lambda x: [int(i) for i in re.findall(r'\d+', x)])
    
    df['text']=df.yourcolumn.apply(lambda x: ''.join(k for k in x if not k.isdigit()))
    
    d={}
    for i in set(df.text):
        dftemp=df[df.text==i]
        ltemp=[(k, z) for k,z in zip(dftemp.index, dftemp.nums)]
        for p in itertools.combinations(ltemp, 2):
            if any(x>y for x in p[0][1] for y in p[1][1]):
                d[(p[0][0], p[1][0])]=(p[0][1], p[1][1])
    

    This will result to a dictionary with all pairs of rows and the respective numbers where there is an increment in numbers. Applied on your data, it gives the following result:

    {(0, 1): ([120, 21], [121, 22]), (0, 4): ([120, 21], [122, 23]), (1, 4): ([121, 22], [122, 23])}
    

    which indicates that there is an increment between rows (0,1) (0,4) and (1,4)