Search code examples
pythonsumduplicatesdata-analysis

Python - Summing values and number of duplicates


I have csv file looking like this: part of the data.

X and Y are my coordinates of pixel.

I need to filter column ADC only for TDC values (in this column are also 0 values), and after this I need to sum up the energy value for every unique value of pixel, so for every x=0 y=0, x=0 y=1, x=0 y=2... until x=127 y=127. And in another column I need the number of duplicates for the pixel coordinates that occurs (so the number of places/rows from which I need to make a summation in the Energy column).

I don't know how to write the appropriate conditions for this kind of task. I will appreciate any type of help.


Solution

  • The following StackOverflow question and answers might help you out: Group dataframe and get sum AND count?

    But here is some code for your case which might be useful, too:

    # import the pandas package, for doing data analysis and manipulation
    import pandas as pd
    
    # create a dummy dataframe using data of the type you are using (I hope)
    df = pd.DataFrame(
        data = {
            "X": [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
            "Y": [0, 0, 1, 1, 1, 1, 1, 1, 2, 2],
            "ADC": ["TDC", "TDC", "TDC", "TDC", "TDC", 0, 0, 0, "TDC", "TDC"],
            "Energy": [0, 0, 1, 1, 1, 2, 2, 2, 3, 3],
            "Time": [1.2, 1.2, 2.3, 2.3, 3.6, 3.61, 3.62, 0.66, 0.67, 0.68],
        }
    )
    
    # use pandas' groupby method and aggregation methods to get the sum of the energy in every unique combination of X and Y, and the number of times those combinations appear
    df[df["ADC"] == "TDC"].groupby(by=["X","Y"]).agg({"Energy": ['sum','count']}).reset_index()
    

    The result I get from this in my dummy example is:

       X  Y Energy      
               sum count
    0  0  0      0     2
    1  0  1      3     3
    2  0  2      6     2