Search code examples
pythonstringstring-matching

Best way to find patterns in a string without knowing what I'm looking for?


I have 500x500 bitmaps containing no more than 16 colors that I need to convert to a text file where each color is represented by a character.

I then need to reduce the size of the text file by finding patterns in each line.

I have the characters right now in a 2D array.

For example:

AHAHAH = 3(AH)

HAHAHA = 3(HA)

AAAHHH = 3(A)3(H)

ABYZTT = ABYZ2(T)

AHAHAB = 2(AH)AB

I don't think I can use regular expressions because there are so many possible combinations.

I am not even sure where to begin.


Solution

  • Here is what I did to solve my problem. I haven't thoroughly checked edge cases, but it's working on my test inputs. Maybe it will be helpful for someone in the future. It's Run-Length Encoding, but for groups of characters, not individual characters. From what I read, normal RLE would encode AAAAHAHA as A4H1A1H1A1, whereas I needed to encode 4A2HA.

    string='AHYAHYAHAHAHAHAHAHAHBBBBBBBTATAZAB*+I'
    length=len(string)
    half=round(length/2)
    new_string=""
    i=1
    while i<=half and string:
      if i>length-i:
        pass
      sub_string1=string[:i]
      sub_string2=string[i:i+i]
      if sub_string1==sub_string2:
        match=True
        count=1
        while match is True:
            sub_string1=string[count*i:(count+1)*i]
            sub_string2=string[(count+1)*i:(count+2)*i]
            if sub_string1 == sub_string2:
              count+=1
            else:
              match=False
              new_string+="("+str(count+1)+")"+sub_string1
              string=string[count*i+i:]
              i=1
      else:  
        if i==len(string):
          new_string+=string[0]
          string=string[1:]
          i=1
        else:
          i+=1
    
    print(new_string)
    (2)AHY(7)AH(7)B(2)TAZAB*+I