I am working with a text file that has text laid out like below:
SCN DD1251
UPSTREAM DOWNSTREAM FILTER
NODE LINK NODE LINK LINK
DD1271 C DD1271 R
DD1351 D DD1351 B
E
SCN DD1271
UPSTREAM DOWNSTREAM FILTER
NODE LINK NODE LINK LINK
DD1301 T DD1301 A
DD1251 R DD1251 C
SCN DD1301
UPSTREAM DOWNSTREAM FILTER
NODE LINK NODE LINK LINK
DD1271 A DD1271 T
B
C
D
SCN DD1351
UPSTREAM DOWNSTREAM FILTER
NODE LINK NODE LINK LINK
A DD1251 D
DD1251 B
C
I am currently using the following regex pattern to match the Node followed by the 5 wide space and following letter like so:
DD1251 B
[A-Z]{2}[0-9]{3}[0-9A-Z] [A-Z]
My goal is to replace the 5 wide space with an underscore to look like so:
DD1251_B
I am trying to achieve this using the following code:
def RemoveLinkSpace(input_file, output_file, pattern):
with open(str(input_file) + ".txt", "r") as file_input:
with open(str(output_file) + ".txt", "w") as output:
for line in file_input:
line = pattern.sub("_", line)
output.write(line)
upstream_pattern = re.compile(r"[A-Z]{2}[0-9]{3}[0-9A-Z] [A-Z]")
RemoveLinkSpace("File1","File2",upstream_pattern)
However, this results in a text file that looks like the below pattern:
SCN DD1251
UPSTREAM DOWNSTREAM FILTER
NODE LINK NODE LINK LINK
_ C DD1271 R
_ D DD1351 B
E
SCN DD1271
UPSTREAM DOWNSTREAM FILTER
NODE LINK NODE LINK LINK
_ T DD1301 A
_ R DD1251 C
My question is, is there a way to still search for the entire regex, but then to only replace the spaces contained within in?
We can replace by group, you missed this point. \1 means the first group, \2 second group
So in search pattern ([A-Z]{2}[0-9]{3}[0-9A-Z]) is first pattern and ([A-Z]) is second pattern.
Also, space between group1 and group 2 exists not 5, just 6. so I search over 5 continue space.
def RemoveLinkSpace(input_file, output_file, pattern):
with open(str(input_file) + ".txt", "r") as file_input:
with open(str(output_file) + ".txt", "w") as output:
for line in file_input:
line = re.sub(pattern,r"\1_\2", line)
output.write(line)
upstream_pattern = re.compile(r"([A-Z]{2}[0-9]{3}[0-9A-Z])[ ]{5,}([A-Z])")
RemoveLinkSpace("in","out", upstream_pattern)