Search code examples
pythondatedatetimetext-filesstrptime

Python Strptime not returning a date, but returning entire text file


I'm trying to strip dates from a text file, but everything I have attempted so far returns the entire text file minus the newline and special characters. I've stripped the text file down to a short paragraph for the purposes of checking my code. The contents of the text file are as follows:

03/01/2018

L205:

On-site 7:00 AM, no crew on-site

On-site 11:30 AM crew has excavated for the vessel pad and is watering rock for placement and compaction. Excavation is measured out and meets the requirements including overex on the plans. Off-site 12:00 PM.

CBK-54:

On-site 7:10 AM crew is installing RCP, crew has installed approximately 80 feet from the manhole. Slurry arrives and the manhole is slurried in place. Off-site 8:30 AM

On-site 1:10 PM, crew has installed more RCP and is nearing completion. Soil is holding up well. Off-site 1:40 PM

I'm wanting to strip the date "03/01/2018" from the text file which is named "Daily_Reports.txt" and is stored on my desktop.

The code I've tried so far is as follows:

import datetime

reports = open('C:/Users/onlyn_000/Desktop/Daily_Reports.txt').read()
print(datetime.datetime.strptime(reports, '%m/%d/%Y').date())

I'm really not even sure if this is a proper approach to my problem. I would ultimately like to pull out each sentence/paragraph for each site (L205, CBK-54, etc.) to input into an excel spreadsheet, or even separate text file for each day. I just want to get the date stripping down as a first step. Any input would be greatly appreciated.

EDIT:

The answer to this question was given by mobone below. The code that worked for me is as follows:

import datetime
import re

reports = open('C:/Users/onlyn_000/Desktop/Daily_Reports.txt').read()
dates = re.findall('[0-9][0-9]\/[0-9][0-9]\/[0-9]*', reports)

for x in dates:
    print(datetime.datetime.strptime(x, '%m/%d/%Y').date())

EDIT 2:

For future reader's reference. I also realized that re.findall returns a list and the for loop I wrote only re-formats the dates into a datetime format. I'm not even sure if I even need a datetime format for my application and I may just be able to use the list.


Solution

  • You'll have to look into re.findall('[0-9][0-9]\/[0-9][0-9]\/[0-9]*', reports) to pull the date string out of the file. And then use strptime to parse it.