I have simple daily dataframe df:
date_rng = pd.date_range(start='2023-01-01', end='2024-01-05', freq='D')
data = np.random.rand(len(date_rng), 3)
df = pd.DataFrame(data, columns=['Column1', 'Column2', 'Column3'], index=date_rng)
And instead of using date_rng.dt.year to pivot by year, I would like it in 6 month date increments.
pivot_df = pd.pivot_table(df, values='Vessel', index=date_rng.dt.year, columns='Col1', aggfunc=pd.Series.nunique)
Any suggestions?
I tried using dt.quarter but this just shows the unique counts by quarter, not be year and quarter. Really any custom grouping other than just yearly would be ideal.
You can try (note I've added Vessel
column to make as you have in your example in the question):
date_rng = pd.date_range(start="2023-01-01", end="2024-01-05", freq="D")
data = np.random.rand(len(date_rng), 3)
df = pd.DataFrame(data, columns=["Column1", "Column2", "Column3"], index=date_rng)
# added Vessel column
df["Vessel"] = np.random.randint(1, 5, size=len(date_rng))
pivot_df = pd.pivot_table(
df,
index=[df.index.year, np.where(df.index.month <= 6, "H1", "H2")],
columns="Vessel",
values=["Column1", "Column2", "Column3"],
aggfunc="nunique",
)
print(pivot_df)
Prints:
Column1 Column2 Column3
Vessel 1 2 3 4 1 2 3 4 1 2 3 4
2023 H1 39.0 41.0 59.0 42.0 39.0 41.0 59.0 42.0 39.0 41.0 59.0 42.0
H2 43.0 53.0 34.0 54.0 43.0 53.0 34.0 54.0 43.0 53.0 34.0 54.0
2024 H1 NaN 1.0 3.0 1.0 NaN 1.0 3.0 1.0 NaN 1.0 3.0 1.0
EDIT: To convert index back to dates:
pivot_df.index = [
pd.to_datetime(f'{year}-{"01-01" if half == "H1" else "06-01"}')
for year, half in pivot_df.index
]
print(pivot_df)
Prints:
Column1 Column2 Column3
Vessel 1 2 3 4 1 2 3 4 1 2 3 4
2023-01-01 48.0 44.0 43.0 46.0 48.0 44.0 43.0 46.0 48.0 44.0 43.0 46.0
2023-06-01 49.0 41.0 48.0 46.0 49.0 41.0 48.0 46.0 49.0 41.0 48.0 46.0
2024-01-01 1.0 1.0 NaN 3.0 1.0 1.0 NaN 3.0 1.0 1.0 NaN 3.0