Search code examples
python-3.xpandaspython-datetimesklearn-pandas

Pandas tolist() function is changing default Date time format values


Consider i've a dataframe that has date value like this and it is pandas object data type.

When i apply get_date function, it's values are changed to datetime format.

enter image description here

When i take unique, the value remains same.

enter image description here

But when i take unique and convert it to list, it changes the default/original values.

enter image description here

.apply(get_date) function looks like this,

enter image description here


Solution

  • It is expected, if check Series.unique:

    Returns
    ndarray or ExtensionArray
    The unique values returned as a NumPy array. See Notes.

    So if add tolist it used numpy.ndarray.tolist for see numpy array in native datetime format:

    rng = pd.date_range('2017-04-03', periods=3)
    a = pd.DataFrame({'DATEDATACHANGED': rng.append(rng) })  
    print (a)
      DATEDATACHANGED
    0      2017-04-03
    1      2017-04-04
    2      2017-04-05
    3      2017-04-03
    4      2017-04-04
    5      2017-04-05
    
    print (a['DATEDATACHANGED'].unique())
    ['2017-04-03T00:00:00.000000000' '2017-04-04T00:00:00.000000000'
     '2017-04-05T00:00:00.000000000']
    
    print (a['DATEDATACHANGED'].unique().tolist())
    [1491177600000000000, 1491264000000000000, 1491350400000000000]
    

    If want convert Series to list use pandas.Series.tolist:

    print (a['DATEDATACHANGED'].tolist())
    [Timestamp('2017-04-03 00:00:00'), Timestamp('2017-04-04 00:00:00'), 
     Timestamp('2017-04-05 00:00:00'), Timestamp('2017-04-03 00:00:00'), 
     Timestamp('2017-04-04 00:00:00'), Timestamp('2017-04-05 00:00:00')]
    

    And for unique values add Series.drop_duplicates:

    print (a['DATEDATACHANGED'].drop_duplicates().tolist())
    [Timestamp('2017-04-03 00:00:00'), Timestamp('2017-04-04 00:00:00'), 
     Timestamp('2017-04-05 00:00:00')]