Search code examples
pythonpandaspython-datetime

My week number 1 should start from 1 January for any year. How to do it in Python [pandas dataframe]?


I have dates in excel format as dataframe as shown below.

code

df = pd.DataFrame([44562.24909722, 44562.27255787, 44562.28564815],columns='orderdate'])
print(df)

output

Out[255]: 
      orderdate
0  44562.249097
1  44562.272558
2  44562.285648

Then, I convert this excel date to real date using python function

code

df['date'] = pd.to_datetime(df.orderdate, unit='D', origin='1899-12-30')

Out[258]: 
     orderdate               date
0  44562.249097 2022-01-01 05:58:41.999808256
1  44562.272558 2022-01-01 06:32:28.999967744
2  44562.285648 2022-01-01 06:51:20.000159744

Now, I have to extract week numbers from the following date column. I use: #code

df['weekno'] = df.date.dt.isocalendar().week

Output

     orderdate               date                weekno
0  44562.249097 2022-01-01 05:58:41.999808256      52
1  44562.272558 2022-01-01 06:32:28.999967744      52
2  44562.285648 2022-01-01 06:51:20.000159744      52

So, in the output, I get weekno = 52 for 1 January 2022. I understand this is a isocalendar format. But I want my week number 1 to start from 1 January every year. How to do that ?


Solution

  • Try using strftime like this:

    from datetime import datetime
    df = pd.DataFrame([44562.24909722, 44562.27255787, 44562.28564815],columns=['orderdate'])
    df['date'] = pd.to_datetime(df.orderdate, unit='D', origin='1899-12-30')
    
    # This line is new
    df['week_of_year'] = df["date"].apply(lambda x: int(x.strftime("%W")))
    print(df)
    

    Output

          orderdate                          date week_of_year
    0  44562.249097 2022-01-01 05:58:41.999808256           0
    1  44562.272558 2022-01-01 06:32:28.999967744           0
    2  44562.285648 2022-01-01 06:51:20.000159744           0
    

    This will give you the week starting at Monday, and the first week of the year is 0.

    Now, if you absolutely need the first week to start at 1, simply add 1 :)

    df['week_of_year'] = df["date"].apply(lambda x: int(x.strftime("%W"))+1)
    

    Output:

          orderdate                          date  week_of_year
    0  44562.249097 2022-01-01 05:58:41.999808256             1
    1  44562.272558 2022-01-01 06:32:28.999967744             1
    2  44562.285648 2022-01-01 06:51:20.000159744             1
    

    P.S. You can check this useful tutorial on strftime. I find it a much more flexible tool for parsing dates than panda's included one.