I have a txt file that contains dates in columns like below. The comma between the day and year is making it hard to import the data into pandas using pd.read_csv(). This is contained within a text file that has other data that should be ignored, so I can't perform some action on the entire document. I need to go through the file, find the dates with this formatting, and remove the commas within the dates, leaving the commas between dates. What's a simple way to accomplish this?
May 15, 2023, May 22, 2023
August 14, 2023, August 21, 2023
November 14, 2023, November 21, 2023
February 14, 2024, February 22, 2024
You can also go by this approach using re.fidall()
to remove the commas from dates then write the output to new file.
import re
with open('my_text_file', 'r') as infile, open('output_file', 'w') as outfile:
file_lines = infile.readlines()
for line in file_lines:
dates = re.findall(r"\b[A-Za-z]+\s\d+,\s\d+\b", line)
for date in dates:
line = line.replace(date, date.replace(",", ""))
outfile.write(line)
This approach will produce the below output which I think is the output you are looking for.
May 15 2023, May 22 2023
August 14 2023, August 21 2023
November 14 2023, November 21 2023
February 14 2024, February 22 2024
Test Code:
import re
file_data = """May 15, 2023, May 22, 2023
August 14, 2023, August 21, 2023
November 14, 2023, November 21, 2023
February 14, 2024, February 22, 2024
"""
result = ""
file_lines = file_data.split('\n')
for line in file_lines:
dates = re.findall(r"\b[A-Za-z]+\s\d+,\s\d+\b", line)
for date in dates:
line = line.replace(date, date.replace(",", ""))
result += f"{line} \n"
print(result)