Search code examples
pythonpython-3.xpandasreplacepython-re

Python re: why isn't my code for replacing values in a file throwing errors or changing the values?


I need to replace certain values in an external file ("file.txt") with new values from a Pandas dataframe. The external file contents look like:

(Many lines of comments, then)
identifier1       label2 = i \ label3        label4                                  
label5
A1 = -5563.88 B2 = -4998 C3 = -203.8888 D4 = 5926.8 
E5 = 24.99876 F6 = 100.6666 G7 = 30.008 H8 = 10.9999
J9 = 1000000 K10 = 1.0002 L11 = 0.1
M12

identifier2       label2 = i \ label3        label4                                  
label5
A1 = -788 B2 = -6554 C3 = -100.23 D4 = 7526.8 
E5 = 20.99876 F6 = 10.6666 G7 = 20.098 H8 = 10.9999
J9 = 1000000 K10 = 1.0002 L11 = 0.000
M12
...

From previous posts here, this resource, and Python's "re", I'm trying:

findThisIdentifierInFile = "identifier1" # I want the data immediately below this identifier in the external file

with open("file.txt", "r") as file:
    file_string = file.read()

    i = -500 # New A1 value (i.e., I want to replace the A1 value in the file with -500).
    j = 100  # New C3 value.  

    string1 = re.sub(
        rf"^({findThisIdentifierInFile}\s.*?)A1 = \S+ C3 = \S+",
        f"\g<1>A1 = {i} C3 = {j}",
        string1,
        flags=re.M | re.S,
    )  

When I run this, there are no errors, but nothing happens. For example, when I print "string1", the data are identical to those in the original "file.txt". I can't provide more of the code but hope that someone who is experienced with RegEx and re (Python) will be able to spot where I have gone wrong. I apologize in advance because I'm certain to have done something silly.

Sometimes I will also want to replace the B2 value and the E5 - H8 values and values on the other lines. I'm wondering whether there's a more foolproof/newbie-friendly method I could use to do any possible replacement of values immediately below a particular identifying label.


Solution

  • IIUC, you can do the string replacement in multiple steps, e.g.:

    import re
    
    text = r"""
    (Many lines of comments, then)
    identifier1       label2 = i \ label3        label4
    label5
    A1 = -5563.88 B2 = -4998 C3 = -203.8888 D4 = 5926.8
    E5 = 24.99876 F6 = 100.6666 G7 = 30.008 H8 = 10.9999
    J9 = 1000000 K10 = 1.0002 L11 = 0.1
    M12
    
    identifier2       label2 = i \ label3        label4
    label5
    A1 = -788 B2 = -6554 C3 = -100.23 D4 = 7526.8
    E5 = 20.99876 F6 = 10.6666 G7 = 20.098 H8 = 10.9999
    J9 = 1000000 K10 = 1.0002 L11 = 0.000
    M12
    ..."""
    
    
    def my_replace_function(g):
        i = -500  # New A1 value
        j = 100  # New C3 value
    
        s = g.group(2)
    
        s = re.sub(r"A1 = \S+", f"A1 = {i}", s)
        s = re.sub(r"C3 = \S+", f"C3 = {j}", s)
    
        return g.group(1) + s + "\n\n"
    
    
    findThisIdentifierInFile = "identifier1"
    text = re.sub(
        rf"^({findThisIdentifierInFile})(.*?)\n\n",
        my_replace_function,
        text,
        flags=re.M | re.S,
    )
    print(text)
    

    Prints:

    
    (Many lines of comments, then)
    identifier1       label2 = i \ label3        label4
    label5
    A1 = -500 B2 = -4998 C3 = 100 D4 = 5926.8
    E5 = 24.99876 F6 = 100.6666 G7 = 30.008 H8 = 10.9999
    J9 = 1000000 K10 = 1.0002 L11 = 0.1
    M12
    
    identifier2       label2 = i \ label3        label4
    label5
    A1 = -788 B2 = -6554 C3 = -100.23 D4 = 7526.8
    E5 = 20.99876 F6 = 10.6666 G7 = 20.098 H8 = 10.9999
    J9 = 1000000 K10 = 1.0002 L11 = 0.000
    M12
    ...