Search code examples
pandasdataframenumpyrow

Multiplication between different rows of a dataframe


I have several dataframes looking like this:

time_hr cell_hour id attitude hour
0.028611 xxx 1 Cruise 1.0
0.028333 xxx 4 Cruise 1.0
0.004722 xxx 16 Cruise 1.0

I want to do a specific multiplications between rows of the 'time_hr' column.
I need to multiply each row with other rows and store the value to use later.
eg. if the column values are [2,3,4], I would need 2x3, 2x4, 3x2, 3x4, 4x2, 4x3 values.
A part of the problem is that I have several dataframes which have different number of rows so I would need a generic way of doing this.
Is there a way? Thanks in advance.


Solution

  • It sounds like a cartesian product to me:

    from io import StringIO
    #sample data reading
    data1 = """
    time_hr cell_hour   id  attitude    hour
    0.028611    xxx 1   Cruise  1.0
    0.028333    xxx 4   Cruise  1.0
    0.004722    xxx 16  Cruise  1.0
    """
    df = pd.read_csv(StringIO(data1), sep="\t")
    
    #filtering dataset to needed columns
    df_time = df[["id", "time_hr"]]
    df_comb = df_time.merge(df_time, how='cross')
    df_comb = df_comb[df_comb["id_x"] != df_comb["id_y"]]
    df_comb["time_hr"] = df_comb["time_hr_x"] * df_comb["time_hr_y"]
    df_comb.drop(columns=["time_hr_x", "time_hr_y"]).set_index(["id_x", "id_y"])
    
    #               time_hr
    #id_x   id_y    
    #1      4       0.000811
    #       16      0.000135
    #4      1       0.000811
    #       16      0.000134
    #16     1       0.000135
    #       4       0.000134
    

    If you want to have more pythonic code you automatise it

    id_column = "id"
    product_columns = ["time_hr"]
    
    df_time = df[[id_column, *product_columns]]
    df_comb = df_time.merge(df_time, how='cross')
    df_comb = df_comb[df_comb[f"{id_column}_x"] != df_comb[f"{id_column}_y"]]
    for column in product_columns:
        df_comb[column] = df_comb[f"{column}_x"] * df_comb[f"{column}_y"]
    df_comb.set_index([f"{id_column}_x", f"{id_column}_y"])\
        .drop(columns=[drop for column in product_columns for drop in [f"{column}_x", f"{column}_y"]])
    

    PS. I am not sure if that is what you were trying to achieve, if not, please add expected output data for those 3 input rows.