Search code examples
pythondecodetxt

decode string "2A4CT2A2C..." into "AACCCCTAACC..." from a text file to another


I have a doc.txt which is like "2A4CT2A2C..." and i want to get "AACCCCTAACC..." and then write it to another doc1.txt I have tried:

(origin and destination are the paths of the docs)

def decode_txt(origin, destination):
    h = open(destination, "w")
    f = open(origin, "r")
    for character in f:
        h.write()

and couldn't think how to continue


Solution

  • You have a pattern of zero or more digits followed by a single character. A regular expression can handle it. (\d*) will group zero or more digits followed by a ([^\d]) - a single non-digit character to repeat.

    import re
    
    def decode_txt(origin, destination):
        with open (origin) as infile:
            text = infile.read()
        with open(destination, "w") as outfile:
            for cnt, char in re.findall(r"(\d*)([^d])", text):
                outfile.write(char * (int(cnt) if cnt else 1))
    
    test = "2A4CT2A2C"
    open("origin", "w").write(test)
    decode_txt("origin", "destination")
    print(open("destination").read())
    assert open("destination").read() == "AACCCCTAACC"
    

    Suppose you just wanted string input and output. This could reduce to

    import re
    
    text = "2A4CT2A2C"
    out = []
    for cnt, char in re.findall(r"(\d*)([^d])", text):
        out.extend(char * (int(cnt) if cnt else 1))
    out = "".join(out)
    

    If you have a lot of text, the out list will be large. You could use io.StringIO() to create a file-like buffer instead.