Search code examples
pythonregexreplacepython-re

Way to substitute only part of a regex string in Python


I am working with a text file that has text laid out like below:

SCN DD1251       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        DD1271      C           DD1271    R                                     
        DD1351      D           DD1351    B                                     
                    E                                                           
                                                                                
SCN DD1271       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        DD1301      T           DD1301    A                                     
        DD1251      R           DD1251    C                                     
                                                                                
SCN DD1301       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        DD1271      A           DD1271    T                                     
                    B                                                           
                    C                                                           
                    D                                                           
                                                                                
SCN DD1351       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
                    A           DD1251    D                                     
        DD1251      B                                                           
                    C   

I am currently using the following regex pattern to match the Node followed by the 5 wide space and following letter like so:

DD1251      B

[A-Z]{2}[0-9]{3}[0-9A-Z]      [A-Z]

My goal is to replace the 5 wide space with an underscore to look like so:

DD1251_B

I am trying to achieve this using the following code:

def RemoveLinkSpace(input_file, output_file, pattern):
  with open(str(input_file) + ".txt", "r") as file_input:
    with open(str(output_file) + ".txt", "w") as output: 
        for line in file_input:
               line = pattern.sub("_", line)
               output.write(line)

upstream_pattern = re.compile(r"[A-Z]{2}[0-9]{3}[0-9A-Z]      [A-Z]")

RemoveLinkSpace("File1","File2",upstream_pattern)

However, this results in a text file that looks like the below pattern:

SCN DD1251       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        _      C           DD1271    R                                     
        _      D           DD1351    B                                     
                    E                                                           
                                                                                
SCN DD1271       
            UPSTREAM               DOWNSTREAM               FILTER              
          NODE     LINK          NODE    LINK                LINK               
        _      T           DD1301    A                                     
        _      R           DD1251    C      

                           

My question is, is there a way to still search for the entire regex, but then to only replace the spaces contained within in?


Solution

  • We can replace by group, you missed this point. \1 means the first group, \2 second group So in search pattern ([A-Z]{2}[0-9]{3}[0-9A-Z]) is first pattern and ([A-Z]) is second pattern.
    Also, space between group1 and group 2 exists not 5, just 6. so I search over 5 continue space.

    def RemoveLinkSpace(input_file, output_file, pattern):
      with open(str(input_file) + ".txt", "r") as file_input:
        with open(str(output_file) + ".txt", "w") as output: 
            for line in file_input:
                   line = re.sub(pattern,r"\1_\2", line)
                   output.write(line)
    
    upstream_pattern = re.compile(r"([A-Z]{2}[0-9]{3}[0-9A-Z])[ ]{5,}([A-Z])")
    
    
    RemoveLinkSpace("in","out", upstream_pattern)