Search code examples
pythonpandassparse-matrix

Want to create a sparse matrix like dataframe from a dataframe in pandas/python


I have a data frame like this input data

I want to convert it to something like this,note the ds is the day someone visited,and will have values from 0 to 31, for the days not visited it will show 0, and for the days visited it will show 1. It's kinda like sparse matrix,can someone help desired result


Solution

  • Update: pd.get_dummies now accepts sparse=True to create a SparseArray output.

    pd.get_dummies(s: pd.Series) can be used to create a one-hot encoding like such:

    header = ["ds", "buyer_id", "email_address"]
    data = [[23, 305, "fatin1bd@gmail.com"],
            [22, 307, "shovonbad@gmail.com"],
            [25, 411, "raisulk@gmail.com"],
            [22, 588, "saiful.sdp@hotmail.com"],
            [24, 664, "osman.dhk@gmail.com"]]
    df = pd.DataFrame(data, columns=header)
    df.join(pd.get_dummies(df["ds"]))
    

    output:

    ds  buyer_id    email_address   22  23  24  25
    0   23  305     fatin1bd@gmail.com  0   1   0   0
    1   22  307     shovonbad@gmail.com     1   0   0   0
    2   25  411     raisulk@gmail.com   0   0   0   1
    3   22  588     saiful.sdp@hotmail.com  1   0   0   0
    4   24  664     osman.dhk@gmail.com     0   0   1   0
    

    Just for added clarification: The resulting dataframe is still stored in a dense format. You could use scipy.sparse matrix formats to store it in a true sparse format.