Search code examples
jsonpython-3.xpandaspandas.excelwriter

JSON input Datetime not formatting correctly in excel using Pandas Excelwriter


I am trying to read in json into a dataframe in Pandas and then output the df to excel using pandas ExcelWriter. I am getting mixed outputs in excel. Both of the datetimes formats in the json are YYYY-MM-DDTHH:MM:SS.sssZ. For example, 2020-04-23T07:39:51.918Z.

Here is my code:

import pandas as pd
from datetime import datetime


with open('simple_json_test.txt', 'r') as f:
    data = f.readlines()

data = map(lambda x: x.rstrip(), data)
data_json_str = "[" + ','.join(data) + "]"
df = pd.read_json(data_json_str)

print (df.dtypes)

# Write the dataframe to excel
writer = pd.ExcelWriter('simpleJsonToExcelTest.xlsx', engine='xlsxwriter')
df.to_excel(writer, header=True, sheet_name='Pandas_Test',index=False)
writer.save()

I get the following error when I run my code: "ValueError" Excel does not support datetimes with timezones. Please ensure that the datetimes are timezone unaware before writing to Excel"

I output the df.types() to see what types are the colums:

Triggered Time            object
action_time       datetime64[ns]
dtype: object

It's weird since the both seem to be the same format in the json. Here is the json

{"action_time":"2020-04-23T07:39:51.918Z","Triggered Time":"2020-04-23T07:39:51.900Z"}
{"action_time":"2020-04-23T07:39:51.918Z","Triggered Time":"2020-04-23T07:39:51.900Z"}
{"action_time":"2020-04-23T07:39:51.918Z","Triggered Time":"2020-04-23T07:39:51.900Z"}
{"action_time":"2020-04-23T07:39:51.918Z","Triggered Time":"2020-04-23T07:39:51.900Z"}

I made the following updates to the code and got it to run successfully, however the output in the excel file is not the same.

import pandas as pd
from datetime import datetime


with open('simple_json_test.txt', 'r') as f:
    data = f.readlines()

data = map(lambda x: x.rstrip(), data)
data_json_str = "[" + ','.join(data) + "]"
df = pd.read_json(data_json_str)

print (df.dtypes)
df['action_time'] = pd.to_datetime(df['action_time'],errors='coerce',utc=True)
df['action_time'] = df['action_time'].apply(lambda a: datetime.strftime(a, "%Y-%m-%d %H:%M:%S%f")[:-3])
df['action_time'] = pd.to_datetime(df['action_time'], errors='coerce',format='%Y-%m-%d %H:%M:%S%f')

print (df.dtypes)

# Write the dataframe to excel
writer = pd.ExcelWriter('simpleJsonToExcelTest.xlsx', engine='xlsxwriter')
df.to_excel(writer, header=True, sheet_name='Pandas_Test',index=False)
writer.save()

I'm new to pandas, so some of the things I have tried, i don't fully understand and may be incorrect. The output in the excel file is:

action_time column is YYYY-MM-DD HH:MM:SS Triggered Time is YYYY-MM-DDTHH:MM:SS.sssZ

action_time Triggered Time
2020-04-23 07:39:51 2020-04-23T07:39:51.918Z

Triggered time is how i want the format to be (YYYY-MM-DDTHH:MM:SS.sssZ). I need to preserve the milliseconds. Looks like action_time in excel is an actual date field, while triggered time is not.

I even tried converting the datatype of the action_time to object and that didn't work. I'm stuck at this point.


Solution

  • I don't know why "action_time" and "Triggered Time" are parsed with different types but replacing the space in "Triggered Time" converts both to datetime64[ns]. Maybe someone else can explain that part.

    Anyway, with that in place you can format the datetime objects in Excel like this:

    import pandas as pd
    from datetime import datetime
    
    
    with open('simple_json_test.txt', 'r') as f:
        data = f.readlines()
    
    data = map(lambda x: x.rstrip(), data)
    data = map(lambda x: x.replace('Triggered Time', 'Triggered_Time'), data)
    
    data_json_str = "[" + ','.join(data) + "]"
    df = pd.read_json(data_json_str)
    
    print (df.dtypes)
    
    # Write the dataframe to excel
    writer = pd.ExcelWriter('simpleJsonToExcelTest.xlsx',
                            engine='xlsxwriter',
                            datetime_format='yyyy-mm-dd hh:mm:ss.000')
    
    df.to_excel(writer, header=True, sheet_name='Pandas_Test', index=False)
    
    # Widen the column for visibility.
    worksheet = writer.sheets['Pandas_Test']
    worksheet.set_column('A:B', 25)
    
    writer.save()
    

    Strip the timezone from the dates if needed. I didn't have to do that. Output:

    enter image description here

    See also Formatting of the Dataframe output in the XlsxWriter docs.