Search code examples
pythonpython-3.xpandasdataframefile-io

How to skip a value when replacing values in an external file with values from a dataframe?


I have a file that's made up of entries like:

A       first = 4 | 1_3_5_4        Name1                                  
labelToSkip
i = 1000000 j = -3 k = -15
end

B       first = 4 | 9_2_2_4        Name2                                  
labelToSkip
i = 150000 j = -3 k = -20
end
...

I asked this question about how to replace certain values in the file with corresponding values from a Pandas dataframe like:

    i      j      k     
0   unit1  unit2  unit3
1   1000   100    84      
2  -3000   200    60       
3  -2000   90     195      
4   900    40     209 

How to write specific values from a Pandas (Python) dataframe to a specific place in a file (i.e., after an identifier)?

I got a great solution. However, I have some lines in my file where only the i and k values need to be replaced. I.e., the dataframe looks like:

    i      k     
0   unit1  unit3
1   1000   84      
2  -3000   60       
3  -2000   195      
4   900    209 

So I would want this result (for example). Here, I use the third row of values from the dataframe to replace only the values in the "i" and "k" fields of "B" in the file:

A       first = 4 | 1_3_5_4        Name1                                  
labelToSkip
i = 1000000 j = -3 k = -15
end

B       first = 4 | 9_2_2_4        Name2                                  
labelToSkip
i = -2000 j = -3 k = 195
end
...

However, in this situation, nothing happens when I run the solution. I have tried changing "idx" to 2. I even tried changing idx to "1" and running it for only i (having removed anything related to j and k) and then k. That doesn't work either. I haven't been able to find anything online about how to ignore/skip a field. If anyone has a hint, I would be grateful.


Solution

  • Adapt the solution to not consider a fixed line with multiple fields, but rather the fields individually. Here we split on \n\n to handle the individual blocks separately.

    import re
    
    idx = 3
    to_replace = 'B'
    
    with (open('input_file.txt', 'r') as f_in,
          open('output_file.txt', 'w') as f_out):
        s = df.loc[idx]
        pat = r'\b(%s)\b(\s*=\s*)(\d+)' % '|'.join(s.index)
        
        f_out.write('\n\n'.join(
             re.sub(pat, lambda m: fr'{m.group(1)}{m.group(2)}{s.loc[m.group(1)]}',
                     block, flags=re.M | re.S)
             if block.startswith(to_replace) else block
             for block in re.split('\n\n', f_in.read())
             )  
                   )
    

    Output:

    A       first = 4 | 1_3_5_4        Name1                                  
    labelToSkip
    i = 1000000 j = -3 k = -15
    end
    
    B       first = 4 | 9_2_2_4        Name2                                  
    labelToSkip
    i = 150000 j = -3 k = -20
    end