Search code examples
pythondatetimestrptime

How to remove unconverted data from a Python datetime object


I have a database of mostly correct datetimes but a few are broke like so: Sat Dec 22 12:34:08 PST 20102015

Without the invalid year, this was working for me:

end_date = soup('tr')[4].contents[1].renderContents()
end_date = time.strptime(end_date,"%a %b %d %H:%M:%S %Z %Y")
end_date = datetime.fromtimestamp(time.mktime(end_date))

But once I hit an object with a invalid year I get ValueError: unconverted data remains: 2, which is great but im not sure how best to strip the bad characters out of the year. They range from 2 to 6 unconverted characters.

Any pointers? I would just slice end_date but im hoping there is a datetime-safe strategy.


Solution

  • Yeah, I'd just chop off the extra numbers. Assuming they are always appended to the datestring, then something like this would work:

    end_date = end_date.split(" ")
    end_date[-1] = end_date[-1][:4]
    end_date = " ".join(end_date)
    

    I was going to try to get the number of excess digits from the exception, but on my installed versions of Python (2.6.6 and 3.1.2) that information isn't actually there; it just says that the data does not match the format. Of course, you could just continue lopping off digits one at a time and re-parsing until you don't get an exception.

    You could also write a regex that will match only valid dates, including the right number of digits in the year, but that seems like overkill.