Search code examples
pythonpandascsvdate-conversion

Converting dates with multiple formats in a CSV file


I have a CSV full of tweets containing a few headers. Among them, for some unknown reason, the date format changes midway from %Y-%m-%d to %d/%m/%Y as shown in the image below. This makes it difficult when trying to export it into another program e.g. Matlab. I'm attempting to solve this in Python, but any other solution would be great.

I've attempted multiple solutions from just googling around. Mainly parsing in a date format when reading the CSV, DateTime.strptime and others. I'm very new to Python so I'm sorry if I'm a bit clueless

I'm looking to standardise all the dates, e.g. changing the %d/%m/%Y to the other format, while keeping it individual row separate.

I'm thinking of following the approach held here, but adding an if statement if it recognises a certain format. How would I go about breaking the date down and changing it then?


Solution

  • This might work but I'm too lazy to check it against an image of a CSV file.

    import pandas as pd
    
    # Put all the formats into a list
    possible_formats = ['%Y-%m-%d', '%d/%m/%Y']
    
    # Read in the data
    data = pd.read_csv("data_file.csv")
    date_column = "date"
    
    # Parse the dates in each format and stash them in a list
    fixed_dates = [pd.to_datetime(data[date_column], errors='coerce', format=fmt) for fmt in possible_formats]
    
    # Anything we could parse goes back into the CSV
    data[date_column] = pd.NaT
    for fixed in fixed_dates:
        data.loc[~pd.isnull(fixed), date_column] = fixed[~pd.isnull(fixed)]
    
    data.to_csv("new_file.csv")