I'm trying to strip dates from a text file, but everything I have attempted so far returns the entire text file minus the newline and special characters. I've stripped the text file down to a short paragraph for the purposes of checking my code. The contents of the text file are as follows:
03/01/2018
L205:
On-site 7:00 AM, no crew on-site
On-site 11:30 AM crew has excavated for the vessel pad and is watering rock for placement and compaction. Excavation is measured out and meets the requirements including overex on the plans. Off-site 12:00 PM.
CBK-54:
On-site 7:10 AM crew is installing RCP, crew has installed approximately 80 feet from the manhole. Slurry arrives and the manhole is slurried in place. Off-site 8:30 AM
On-site 1:10 PM, crew has installed more RCP and is nearing completion. Soil is holding up well. Off-site 1:40 PM
I'm wanting to strip the date "03/01/2018" from the text file which is named "Daily_Reports.txt" and is stored on my desktop.
The code I've tried so far is as follows:
import datetime
reports = open('C:/Users/onlyn_000/Desktop/Daily_Reports.txt').read()
print(datetime.datetime.strptime(reports, '%m/%d/%Y').date())
I'm really not even sure if this is a proper approach to my problem. I would ultimately like to pull out each sentence/paragraph for each site (L205, CBK-54, etc.) to input into an excel spreadsheet, or even separate text file for each day. I just want to get the date stripping down as a first step. Any input would be greatly appreciated.
EDIT:
The answer to this question was given by mobone below. The code that worked for me is as follows:
import datetime
import re
reports = open('C:/Users/onlyn_000/Desktop/Daily_Reports.txt').read()
dates = re.findall('[0-9][0-9]\/[0-9][0-9]\/[0-9]*', reports)
for x in dates:
print(datetime.datetime.strptime(x, '%m/%d/%Y').date())
EDIT 2:
For future reader's reference. I also realized that re.findall returns a list and the for loop I wrote only re-formats the dates into a datetime format. I'm not even sure if I even need a datetime format for my application and I may just be able to use the list.
You'll have to look into re.findall('[0-9][0-9]\/[0-9][0-9]\/[0-9]*', reports)
to pull the date string out of the file. And then use strptime to parse it.