Search code examples
pythonpandasdataframegroup-bypivot-table

How to create pivot table by 6 month interval rather than year?


I have simple daily dataframe df:

date_rng = pd.date_range(start='2023-01-01', end='2024-01-05', freq='D')
data = np.random.rand(len(date_rng), 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'], index=date_rng)

And instead of using date_rng.dt.year to pivot by year, I would like it in 6 month date increments.

pivot_df = pd.pivot_table(df, values='Vessel', index=date_rng.dt.year, columns='Col1',     aggfunc=pd.Series.nunique)

Any suggestions?

I tried using dt.quarter but this just shows the unique counts by quarter, not be year and quarter. Really any custom grouping other than just yearly would be ideal.


Solution

  • You can try (note I've added Vessel column to make as you have in your example in the question):

    date_rng = pd.date_range(start="2023-01-01", end="2024-01-05", freq="D")
    data = np.random.rand(len(date_rng), 3)
    df = pd.DataFrame(data, columns=["Column1", "Column2", "Column3"], index=date_rng)
    
    # added Vessel column
    df["Vessel"] = np.random.randint(1, 5, size=len(date_rng))
    
    pivot_df = pd.pivot_table(
        df,
        index=[df.index.year, np.where(df.index.month <= 6, "H1", "H2")],
        columns="Vessel",
        values=["Column1", "Column2", "Column3"],
        aggfunc="nunique",
    )
    print(pivot_df)
    

    Prints:

            Column1                   Column2                   Column3                  
    Vessel        1     2     3     4       1     2     3     4       1     2     3     4
    2023 H1    39.0  41.0  59.0  42.0    39.0  41.0  59.0  42.0    39.0  41.0  59.0  42.0
         H2    43.0  53.0  34.0  54.0    43.0  53.0  34.0  54.0    43.0  53.0  34.0  54.0
    2024 H1     NaN   1.0   3.0   1.0     NaN   1.0   3.0   1.0     NaN   1.0   3.0   1.0
    

    EDIT: To convert index back to dates:

    pivot_df.index = [
        pd.to_datetime(f'{year}-{"01-01" if half == "H1" else "06-01"}')
        for year, half in pivot_df.index
    ]
    print(pivot_df)
    

    Prints:

               Column1                   Column2                   Column3                  
    Vessel           1     2     3     4       1     2     3     4       1     2     3     4
    2023-01-01    48.0  44.0  43.0  46.0    48.0  44.0  43.0  46.0    48.0  44.0  43.0  46.0
    2023-06-01    49.0  41.0  48.0  46.0    49.0  41.0  48.0  46.0    49.0  41.0  48.0  46.0
    2024-01-01     1.0   1.0   NaN   3.0     1.0   1.0   NaN   3.0     1.0   1.0   NaN   3.0