Search code examples
pythonapache-sparkpysparkspark-koalas

Is there a better solution thant dt.weekofyear?


Is there a better solution than df['weekofyear'] = df['date'].dt.weekofyear?

The problem of this solution is that, sometimes, the days after the last week of the year n but before the first week of the year n+1 are counted as week 1 and and not as week 0.

I am working with pyspark and koalas (no pandas allowed).

Here is an example:

Problematic df

As you can see, the first column is Date, the second one is week, the third is month and last is year.


Solution

  • Not sure if this is what you want...? I suppose you can use case when to replace the undesired values of week of year.

    df['weekofyear'] = df['date'].dt.weekofyear
    
    df2 = ks.sql("""
    select
        date,
        case when weekofyear = 1 and month = 12 then 53 else weekofyear end as weekofyear,
        month,
        year
    from {df}""")