I'm trying to sort a CSV by date first then time second. With Pandas, it was easy by using df = df.sort_values(by=['Date', 'Time_UTC'])
. In the csv library, the code is (from here):
with open ('eqph_csv_29May2020_noF_5lines.csv') as file:
reader = csv.DictReader(file, delimiter=',')
date_sorted = sorted(reader, key=lambda Date: datetime.strptime('Date', '%Y-%m-%d'))
print(date_sorted)
The datetime documentation clearly says these codes are right. Here's a sample CSV (no delimiter):
Date Time_UTC Latitude Longitude
2020-05-28 05:17:31 16.63 120.43
2020-05-23 02:10:27 15.55 121.72
2020-05-20 12:45:07 5.27 126.11
2020-05-09 19:18:12 14.04 120.55
2020-04-10 18:45:49 5.65 126.54
csv.DictReader
returns an iterator that yields a dict
for each row in the csv file. To sort it on a column from each row, you need to specify that column in the sort function:
date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'], '%Y-%m-%d'))
To sort on both Date
and Time_UTC
, you could combine them into one string and convert that to a datetime
:
date_sorted = sorted(reader, key=lambda row: datetime.strptime(row['Date'] + ' ' + row['Time_UTC'], '%Y-%m-%d %H:%M:%S'))