Search code examples
pythonreplaceupdateslines

Set of lines to replace in file - python


I am new to python. I am trying to use a file with new data (newprops) to replace the old data on a second file. Both files are over 3MB.

File with new data looks like this:

PROD    850 30003   0.096043  
PROD    851 30003   0.096043  
PROD    853 30003   0.096043  
PROD    852 30003   0.096043  
....

Original file with old data is something like:

CROD    850     123456 123457 123458 123459  
PROD    850     30003   0.08  
CROD    851     123456 123457 123458 123459  
PROD    851     30003   0.07  
CROD    852     123456 123457 123458 123459  
PROD    852     30003   0.095  
CROD    853     123456 123457 123458 123459  
PROD    853     30003   0.095  
....

Output should be:

CROD    850     123456 123457 123458 123459  
PROD    850     30003   0.096043  
CROD    851     123456 123457 123458 123459  
PROD    851     30003   0.096043  
CROD    852     123456 123457 123458 123459  
PROD    852     30003   0.096043  
CROD    853     123456 123457 123458 123459  
PROD    853     30003   0.096043  

Here's what I have so far:

import fileinput

def prop_update(newprops,bdffile):

    fnewprops=open(newprops,'r')
    fbdf=open(bdffile,'r+')
    newpropsline=fnewprops.readline()
    fbdfline=fbdf.readline()


    while len(newpropsline)>0:
        fbdf.seek(0)
        propname=newpropsline.split()[1]
        propID=newpropsline.split()[2]
            while len(fbdfline)>0:
                if propID and propname in fbdfline:
                    bdffile.write(newpropsline) #i'm stuck here... I want to delete the old line and use updated value                   
                else:                    
                    fbdfline=fbdfline.readline()

        newpropsline=fnewprops.readline()

    fnewprops.close()

Please help!


Solution

  • You can use a dict to index the new data. Then write the original file to a new file, line by line, updating data from the index as you go. It looks like the first three items should be the key ("PROD 850 30003") and they can be pulled out with a regex such as (PROD\s+\d+\s+\d+).

    import re
    _split_new = re.compile(r"(PROD\s+\d+\s+\d+)(.*)")
    
    # create an index for the PROD items to be updated
    
    # this might be a bit more understandable...
    #with open('updates.txt') as updates:
    #    new_data = {}
    #    for line in updates:
    #        match = _split_new.match(line)
    #        if match:
    #            key, value = match.groups()
    #            new_data[key] = value
    
    # ... but this is fancier (and likely faster)
    with open('updates.txt') as updates:
        new_data = dict(match.groups() 
            for match in (_split_new.search(line) for line in updates)
            if match)
    
    # then process the updates
    with open('origstuff.txt') as orig, open('newstuff.txt', 'w') as newstuff:
        # for each line in the original...
        for line in orig:
            match = _split_new.match(line)
            # ... see if its a PROD line
            if match:
                key, value = match.groups()
                # ... and rewrite with value from indexing dict (defaulting to current value)
                newstuff.write("%s%s\n" % (key, new_data.get(key, value)))
            else:
                # ... or just the original line
                newstuff.write(line)