Search code examples
pythondatetimedigits

justifying the YEAR column digit in Python


I have a large data set . I converted the CSV into a dataframe with panda. The column includes the year from 1965 to 2015. The sample of this column is like

1965.0
  66.0
  67.0
   .
   .
   .
  69.0
1970.0
  71.0
   .
   .
  79.0
1980.0
   . 
   .
   .
2000.0
   1.0
   2.0
    .
    .
    .
  15.0

So my question for you is how can I change all this column to a 4 digit format without the last .0

BTW when I checked my data with .info() This column is :

Year                51 non-null    object

Thank you


Solution

  • you could convert the column to float, apply a custom function that adds 1900 or 2000 respectively. cast the output of that to type int if that is more useful to you. Ex:

    import pandas as pd
    
    df = pd.DataFrame({'y': ['1970.0',
                               '71.0',
                               '79.0',
                             '1980.0',
                             '2000.0',
                                '1.0',
                                '2.0',
                               '15.0']})
    
    def to_4digit(i):
        if i < 1900:
            if i >= 65:
                return 1900 + i
            return 2000 + i
        return i
    
    df['y'] = df['y'].astype(float).apply(to_4digit).astype(int)
    # df['y']
    # 0    1970
    # 1    1971
    # 2    1979
    # 3    1980
    # 4    2000
    # 5    2001
    # 6    2002
    # 7    2015