python-3.x pandas python-datetime sklearn-pandas

Pandas tolist() function is changing default Date time format values

Consider i've a dataframe that has date value like this and it is pandas object data type.

When i apply get_date function, it's values are changed to datetime format.

When i take unique, the value remains same.

But when i take unique and convert it to list, it changes the default/original values.

.apply(get_date) function looks like this,

Solution

It is expected, if check Series.unique:

Returns
ndarray or ExtensionArray
The unique values returned as a NumPy array. See Notes.

So if add tolist it used numpy.ndarray.tolist for see numpy array in native datetime format:

rng = pd.date_range('2017-04-03', periods=3)
a = pd.DataFrame({'DATEDATACHANGED': rng.append(rng) })  
print (a)
  DATEDATACHANGED
0      2017-04-03
1      2017-04-04
2      2017-04-05
3      2017-04-03
4      2017-04-04
5      2017-04-05

print (a['DATEDATACHANGED'].unique())
['2017-04-03T00:00:00.000000000' '2017-04-04T00:00:00.000000000'
 '2017-04-05T00:00:00.000000000']

print (a['DATEDATACHANGED'].unique().tolist())
[1491177600000000000, 1491264000000000000, 1491350400000000000]

If want convert Series to list use pandas.Series.tolist:

print (a['DATEDATACHANGED'].tolist())
[Timestamp('2017-04-03 00:00:00'), Timestamp('2017-04-04 00:00:00'), 
 Timestamp('2017-04-05 00:00:00'), Timestamp('2017-04-03 00:00:00'), 
 Timestamp('2017-04-04 00:00:00'), Timestamp('2017-04-05 00:00:00')]

And for unique values add Series.drop_duplicates:

print (a['DATEDATACHANGED'].drop_duplicates().tolist())
[Timestamp('2017-04-03 00:00:00'), Timestamp('2017-04-04 00:00:00'), 
 Timestamp('2017-04-05 00:00:00')]