I have a pandas dataframe with time series data, where the columns are looking like this:
Customer | Item | Date | 00:00 | 00:30 | 01:00 | ... | 23:30 |
---|---|---|---|---|---|---|---|
XYZ | A | 2020-01-01 | 0 | 1 | 2 | ... | 3 |
XYZ | B | 2020-01-02 | 0 | 2 | 2 | ... | 5 |
ABC | A | 2020-01-01 | 0 | 1 | 5 | ... | 3 |
ABC | B | 2020-01-02 | 0 | 2 | 2 | ... | 1 |
So the hours are in the columns, instead of the rows. I want to manipulate this dataframe, concatenate the time columns into the date column, and make them a separate rows, like this:
Customer | Date | Item A | Item B |
---|---|---|---|
XYZ | 2020-01-01 00:00 | 1 | 2 |
XYZ | 2020-01-01 00:30 | 1 | 2 |
XYZ | 2020-01-01 01:00 | 1 | 2 |
XYZ | 2020-01-02 00:00 | 1 | 2 |
XYZ | 2020-01-02 00:30 | 1 | 2 |
XYZ | 2020-01-02 01:00 | 1 | 2 |
ABC | 2020-01-01 00:00 | 2 | 3 |
ABC | 2020-01-01 00:30 | 2 | 2 |
ABC | 2020-01-01 01:00 | 4 | 2 |
ABC | 2020-01-02 00:00 | 2 | 3 |
ABC | 2020-01-02 00:30 | 2 | 2 |
ABC | 2020-01-02 01:00 | 4 | 2 |
How can I do this? I tried a method using cross join, but that is very uneffective, because I have a lot of rows. (~100000)
You could try the following (with df
your dataframe):
df["Date"] = pd.to_datetime(df["Date"])
df = (
df.rename(columns={"Item": "Items"})
.melt(id_vars=["Customer", "Items", "Date"], var_name="Time", value_name="Item")
.assign(Date=lambda df: df["Date"] + pd.to_timedelta(df["Time"] + ":00"))
.drop(columns="Time")
.pivot(index=["Customer", "Date"], columns="Items")
.reset_index()
)
df.columns = [a if not b else f"{a} {b}" for a, b in df.columns]