Search code examples
pythonpandassumtypeerror

Running sums in pandas with row specification


I have some data where I am trying to calculate the total measurement for all counts and sum of the measurements for counts 2, 3, and 4, for each lot item number of each lot. Ideally, I would have 2 extra columns on the original data that has the total measurement and measurements for counts 2, 3, and 4-- even though those values would be duplicated, they would be represented with each record. Here is an example of the dataset:

Date    Sample Type Lot #   Lot item #  Count   Measurement
0   2021-12-05  G   ABS123-G    1   1   5.0
1   2021-12-05  G   ABS123-G    1   2   3.0
2   2021-12-05  G   ABS123-G    1   3   7.0
3   2021-12-05  G   ABS123-G    1   4   25.1
4   2021-12-05  G   ABS123-G    1   5   66.0
5   2021-12-05  G   ABS123-G    1   6   54.0
6   2021-12-05  G   ABS123-G    1   7   12.0
7   2021-12-05  G   ABS123-G    1   8   0.0
8   2021-12-05  G   ABS123-G    1   9   1.0
9   2021-12-05  G   ABS123-G    1   10  5.0
10  2021-12-05  G   ABS123-G    2   1   2.0
11  2021-12-05  G   ABS123-G    2   2   4.0
12  2021-12-05  G   ABS123-G    2   3   889.0
13  2021-12-05  G   ABS123-G    2   4   12.4
14  2021-12-05  G   ABS123-G    2   5   51.4
15  2021-12-05  G   ABS123-G    2   6   12.0
16  2021-12-05  G   ABS123-G    2   7   14.0
17  2021-12-05  G   ABS123-G    2   8   2.0
18  2021-12-05  G   ABS123-G    2   9   1.0
19  2021-12-05  G   ABS123-G    2   10  0.1
20  2021-12-05  B   ABS123-B    1   1   4.0
21  2021-12-05  B   ABS123-B    1   2   58.0
22  2021-12-05  B   ABS123-B    1   3   123.0
23  2021-12-05  B   ABS123-B    1   4   12.5
24  2021-12-05  B   ABS123-B    1   5   11.0
25  2021-12-05  B   ABS123-B    1   6   135.5
26  2021-12-05  B   ABS123-B    1   7   17.0
27  2021-12-05  B   ABS123-B    1   8   1.0
28  2021-12-05  B   ABS123-B    1   9   5.0
29  2021-12-05  B   ABS123-B    1   10  0.3

My approach was to try and filter the counts to 2,3, 4, calculate the sum, and then join the df to the original based on the lot and lot item # and then do something similar with the total. However, I am running into an error when I trying to sum.

df2 = df.loc[(df['Count'] == 2) | (df['Count'] == 3) | (df['Count'] == 4)]
df2['Counts 2,3,4'] = df2.grouby(['Lot #, 'Lot item #'])['Measurement'].sum()
df2

TypeError: incompatible index of inserted column with frame index

The filter worked, but not the second part. First, I don't know what is causing the error, do I need to reset the index? Also, is this the correct approach? Any suggestions are welcomed.


Solution

  • We can use isin to simplify the equality checks by defining a list of integer values. We can then use join after groupby sum and specify the columns to join on. Lastly rename the Series to the new column name:

    # Specify columns to groupby and join back on
    grp_cols = ['Lot #', 'Lot item #']
    joined = df.join(
        df[df['Count'].isin([2, 3, 4])]  # Values to include
            .groupby(grp_cols)['Measurement'].sum()  # Take sum per group
            .rename('Counts 2,3,4'),  # Specify new column name
        on=grp_cols,
    )
    

    joined:

              Date Sample Type     Lot #  Lot item #  Count  Measurement  Counts 2,3,4
    0   2021-12-05           G  ABS123-G           1      1          5.0          35.1
    1   2021-12-05           G  ABS123-G           1      2          3.0          35.1
    2   2021-12-05           G  ABS123-G           1      3          7.0          35.1
    3   2021-12-05           G  ABS123-G           1      4         25.1          35.1
    4   2021-12-05           G  ABS123-G           1      5         66.0          35.1
    5   2021-12-05           G  ABS123-G           1      6         54.0          35.1
    6   2021-12-05           G  ABS123-G           1      7         12.0          35.1
    7   2021-12-05           G  ABS123-G           1      8          0.0          35.1
    8   2021-12-05           G  ABS123-G           1      9          1.0          35.1
    9   2021-12-05           G  ABS123-G           1     10          5.0          35.1
    10  2021-12-05           G  ABS123-G           2      1          2.0         905.4
    11  2021-12-05           G  ABS123-G           2      2          4.0         905.4
    12  2021-12-05           G  ABS123-G           2      3        889.0         905.4
    13  2021-12-05           G  ABS123-G           2      4         12.4         905.4
    14  2021-12-05           G  ABS123-G           2      5         51.4         905.4
    15  2021-12-05           G  ABS123-G           2      6         12.0         905.4
    16  2021-12-05           G  ABS123-G           2      7         14.0         905.4
    17  2021-12-05           G  ABS123-G           2      8          2.0         905.4
    18  2021-12-05           G  ABS123-G           2      9          1.0         905.4
    19  2021-12-05           G  ABS123-G           2     10          0.1         905.4
    20  2021-12-05           B  ABS123-B           1      1          4.0         193.5
    21  2021-12-05           B  ABS123-B           1      2         58.0         193.5
    22  2021-12-05           B  ABS123-B           1      3        123.0         193.5
    23  2021-12-05           B  ABS123-B           1      4         12.5         193.5
    24  2021-12-05           B  ABS123-B           1      5         11.0         193.5
    25  2021-12-05           B  ABS123-B           1      6        135.5         193.5
    26  2021-12-05           B  ABS123-B           1      7         17.0         193.5
    27  2021-12-05           B  ABS123-B           1      8          1.0         193.5
    28  2021-12-05           B  ABS123-B           1      9          5.0         193.5
    29  2021-12-05           B  ABS123-B           1     10          0.3         193.5
    

    Sample DataFrame constructor:

    import pandas as pd
    
    df = pd.DataFrame({
        'Date': pd.to_datetime(
            ['2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05',
             '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05',
             '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05',
             '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05',
             '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05',
             '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05', '2021-12-05']),
        'Sample Type': ['G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G',
                        'G', 'G', 'G', 'G', 'G', 'G', 'G', 'G', 'B', 'B', 'B', 'B',
                        'B', 'B', 'B', 'B', 'B', 'B'],
        'Lot #': ['ABS123-G', 'ABS123-G', 'ABS123-G', 'ABS123-G', 'ABS123-G',
                  'ABS123-G', 'ABS123-G', 'ABS123-G', 'ABS123-G', 'ABS123-G',
                  'ABS123-G', 'ABS123-G', 'ABS123-G', 'ABS123-G', 'ABS123-G',
                  'ABS123-G', 'ABS123-G', 'ABS123-G', 'ABS123-G', 'ABS123-G',
                  'ABS123-B', 'ABS123-B', 'ABS123-B', 'ABS123-B', 'ABS123-B',
                  'ABS123-B', 'ABS123-B', 'ABS123-B', 'ABS123-B', 'ABS123-B'],
        'Lot item #': [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
                       1, 1, 1, 1, 1, 1, 1, 1, 1, 1],
        'Count': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 1,
                  2, 3, 4, 5, 6, 7, 8, 9, 10],
        'Measurement': [5.0, 3.0, 7.0, 25.1, 66.0, 54.0, 12.0, 0.0, 1.0, 5.0, 2.0,
                        4.0, 889.0, 12.4, 51.4, 12.0, 14.0, 2.0, 1.0, 0.1, 4.0,
                        58.0, 123.0, 12.5, 11.0, 135.5, 17.0, 1.0, 5.0, 0.3]
    })