Search code examples
pythondatepandasdataframeiso

Convert a common date format in an ISO week date format


i have this dataframe with this kind of date format

           Date  Week Number   Influenza[it]  Febbre[it]  Rinorrea[it]  
0    2008-01-01             1            220         585           103   
1    2008-01-08             2            403         915           147   
2    2008-01-15             3            366         895           136   
3    2008-01-22             4            305         825           136   
4    2008-01-29             5            311         837           121 
... ...

I'd like to convert the date format in the ISO week date format like this dataframe (because i need to intersect the two dataframes with the same dates, based on the years and weeks). The format is like "year-weeknumberoftheyear".

0     2007-42
1     2007-43
2     2007-44
3     2007-45
4     2007-46
... ...

So i was able just to find the ISO weeks of the first dataframe in this way:

wiki = pd.read_csv('file.csv', parse_dates=['Date'])
for i,d in wiki.iterrows():
    print d.Date.isocalendar()[1]

Output:

1
2
3
4
...

But i don't know how to make a date format like the second dataframe (in the way "year-weeknumberoftheyear")


Solution

  • You could use a vectorized approach instead after the read operation:

    df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%Y-%V')
    df['Date']
    0    2008-01
    1    2008-02
    2    2008-03
    3    2008-04
    4    2008-05
    Name: Date, dtype: object
    

    Here, %V is the directive corresponding to ISO 8601 week number.


    demo:

    from io import StringIO
    data = StringIO(
    '''
    Date     Week Number   Influenza[it]  Febbre[it]  Rinorrea[it]  
    2008-01-01             1            220         585           103   
    2008-01-08             2            403         915           147   
    2008-01-15             3            366         895           136   
    2008-01-22             4            305         825           136   
    2008-01-29             5            311         837           121
    ''')
    df = pd.read_csv(data, sep='\s{2,}', parse_dates=['Date'], engine='python')
    df
    

    enter image description here

    df['Date'].dtypes
    dtype('<M8[ns]')
    
    df['Date'].dt.strftime('%Y-%V')
    0    2008-01
    1    2008-02
    2    2008-03
    3    2008-04
    4    2008-05
    Name: Date, dtype: object
    

    edit: (though inefficient, only for reproducibility purposes)

    L = ['{}-{}'.format(d.Date.isocalendar()[0], str(d.Date.isocalendar()[1]).zfill(2)) for i,d in wiki.iterrows()]
    

    Construct series:

    >>> pd.Series(L)
    0    2008-01
    1    2008-02
    2    2008-03
    3    2008-04
    4    2008-05
    dtype: object