I have a CSV full of tweets containing a few headers. Among them, for some unknown reason, the date format changes midway from %Y-%m-%d
to %d/%m/%Y
as shown in the image below.
This makes it difficult when trying to export it into another program e.g. Matlab. I'm attempting to solve this in Python, but any other solution would be great.
I've attempted multiple solutions from just googling around. Mainly parsing in a date format when reading the CSV, DateTime.strptime
and others. I'm very new to Python so I'm sorry if I'm a bit clueless
I'm looking to standardise all the dates, e.g. changing the %d/%m/%Y
to the other format, while keeping it individual row separate.
I'm thinking of following the approach held here, but adding an if statement if it recognises a certain format. How would I go about breaking the date down and changing it then?
This might work but I'm too lazy to check it against an image of a CSV file.
import pandas as pd
# Put all the formats into a list
possible_formats = ['%Y-%m-%d', '%d/%m/%Y']
# Read in the data
data = pd.read_csv("data_file.csv")
date_column = "date"
# Parse the dates in each format and stash them in a list
fixed_dates = [pd.to_datetime(data[date_column], errors='coerce', format=fmt) for fmt in possible_formats]
# Anything we could parse goes back into the CSV
data[date_column] = pd.NaT
for fixed in fixed_dates:
data.loc[~pd.isnull(fixed), date_column] = fixed[~pd.isnull(fixed)]
data.to_csv("new_file.csv")