I have a large data set . I converted the CSV into a dataframe with panda. The column includes the year from 1965 to 2015. The sample of this column is like
1965.0
66.0
67.0
.
.
.
69.0
1970.0
71.0
.
.
79.0
1980.0
.
.
.
2000.0
1.0
2.0
.
.
.
15.0
So my question for you is how can I change all this column to a 4 digit format without the last .0
BTW when I checked my data with .info() This column is :
Year 51 non-null object
Thank you
you could convert the column to float
, apply a custom function that adds 1900 or 2000 respectively. cast the output of that to type int
if that is more useful to you. Ex:
import pandas as pd
df = pd.DataFrame({'y': ['1970.0',
'71.0',
'79.0',
'1980.0',
'2000.0',
'1.0',
'2.0',
'15.0']})
def to_4digit(i):
if i < 1900:
if i >= 65:
return 1900 + i
return 2000 + i
return i
df['y'] = df['y'].astype(float).apply(to_4digit).astype(int)
# df['y']
# 0 1970
# 1 1971
# 2 1979
# 3 1980
# 4 2000
# 5 2001
# 6 2002
# 7 2015