Search code examples
csvdatetimestrptime

Sorting with csv library, error says my dates don't match '%Y-%m-%d' format when it does


I'm trying to sort a CSV by date first then time second. With Pandas, it was easy by using df = df.sort_values(by=['Date', 'Time_UTC']). In the csv library, the code is (from here):

with open ('eqph_csv_29May2020_noF_5lines.csv') as file:
    reader = csv.DictReader(file, delimiter=',') 
    date_sorted = sorted(reader, key=lambda Date: datetime.strptime('Date', '%Y-%m-%d'))
    print(date_sorted)

The datetime documentation clearly says these codes are right. Here's a sample CSV (no delimiter):

Date    Time_UTC    Latitude    Longitude
2020-05-28  05:17:31    16.63   120.43
2020-05-23  02:10:27    15.55   121.72
2020-05-20  12:45:07    5.27    126.11
2020-05-09  19:18:12    14.04   120.55
2020-04-10  18:45:49    5.65    126.54

Solution

  • csv.DictReader returns an iterator that yields a dict for each row in the csv file. To sort it on a column from each row, you need to specify that column in the sort function:

    date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'], '%Y-%m-%d'))
    

    To sort on both Date and Time_UTC, you could combine them into one string and convert that to a datetime:

    date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'] + ' ' + row['Time_UTC'], '%Y-%m-%d %H:%M:%S'))