Search code examples
pythonpandastimeseries

How can I manipulate this pandas dataframe with time series data in order to be more easier to use?


I have a pandas dataframe with time series data, where the columns are looking like this:

Customer Item Date 00:00 00:30 01:00 ... 23:30
XYZ A 2020-01-01 0 1 2 ... 3
XYZ B 2020-01-02 0 2 2 ... 5
ABC A 2020-01-01 0 1 5 ... 3
ABC B 2020-01-02 0 2 2 ... 1

So the hours are in the columns, instead of the rows. I want to manipulate this dataframe, concatenate the time columns into the date column, and make them a separate rows, like this:

Customer Date Item A Item B
XYZ 2020-01-01 00:00 1 2
XYZ 2020-01-01 00:30 1 2
XYZ 2020-01-01 01:00 1 2
XYZ 2020-01-02 00:00 1 2
XYZ 2020-01-02 00:30 1 2
XYZ 2020-01-02 01:00 1 2
ABC 2020-01-01 00:00 2 3
ABC 2020-01-01 00:30 2 2
ABC 2020-01-01 01:00 4 2
ABC 2020-01-02 00:00 2 3
ABC 2020-01-02 00:30 2 2
ABC 2020-01-02 01:00 4 2

How can I do this? I tried a method using cross join, but that is very uneffective, because I have a lot of rows. (~100000)


Solution

  • You could try the following (with df your dataframe):

    df["Date"] = pd.to_datetime(df["Date"])
    df = (
        df.rename(columns={"Item": "Items"})
        .melt(id_vars=["Customer", "Items", "Date"], var_name="Time", value_name="Item")
        .assign(Date=lambda df: df["Date"] + pd.to_timedelta(df["Time"] + ":00"))
        .drop(columns="Time")
        .pivot(index=["Customer", "Date"], columns="Items")
        .reset_index()
    )
    df.columns = [a if not b else f"{a} {b}" for a, b in df.columns]