Search code examples
pythonpython-3.xreplaceextractslice

Python: for a long string where a certain word is repeated, how to identify the first occurrence of the word after a unique word?


I have a large file that is made up of many blocks of data. For example, two blocks of data would look like:

name1   1234567           comment                           
property1 = 1234567.98765 property2 = 1234567.98765
property3 = 1234567.98765
final

name2   1234568           comment                           
property1 = 987654.321 property2 = 9876543.0
property3 = 1234567.98765
final
...

Problem. I have code to modify one block of data. However, the code results in a string (updated_string) that contains ALL data blocks in the file (the modified data block and all other unmodified data blocks).

Goal. I only want the modified data block in updated_string and then I want to put only updated_string in an external file and leave all other data blocks in the file unmodified.

So far I have figured out from previous posts here how to delete everything from updated_string that comes before the modified data block. For example, if the second data block has been modified, I would do:

mystring = "name2"
begin = string.find(mystring)
string[:begin]

However, I am not able to delete everything after the "final" in the data block I want. I know I can do

mystring2 = "final"
stop = string.find(mystring2)
string[stop:]

but it doesn't identify the particular data block I want. Can anyone please suggest how I might look for the first "final" after name2 so that I can get a string made up of only the data block I want?


Solution

  • The logic is not fully clear, but assuming you want to find the block between name2 and the first final that follows it, just adapting your current logic should work:

    mystring = "name2"
    begin = string.find(mystring)
    string = string[begin:]         # we drop all before mystring
    
    mystring2 = "final"
    stop = string.find(mystring2)   # now we find the stop in the new string
    string = string[:stop+len(mystring2)]
    

    Or, better, use the start parameter of str.find:

    mystring = "name2"
    begin = string.find(mystring)
    
    mystring2 = "final"
    # now we only search the stop word after
    # the position of the start word (+ its length)
    stop = string.find(mystring2, begin+len(mystring))
    
    out = string[begin:stop+len(mystring2)]
    

    Output:

    name2   1234568           comment                           
    property1 = 987654.321 property2 = 9876543.0
    property3 = 1234567.98765
    final