Search code examples
pythonpandaspymysql

How to take sum of column on grouping rows of column 1 in dataframe


This is my data frame attached in image and I want to sum as per the first column where ?? represent no value for that entry DATAFRAME IMAGE SAMPLE I want to write code to have an output like below HWK 0 CSP 0 DCF 0 I want to sum the values only if n/a or ?? occurs then ignore it and do the sum so that I can iterate through all the data frame for the same procedure

HWK      0 
CSP     ?? 
DCF      0 
BM_IO   ?? 
CS      ?? 
os      ?? 
CS       0 
ESC      0 
ESC      0 
DCF      0 
DLHLTL  ?? 
CSP      0 
CSP      0 
ESC      0 
CSP     ??

Solution

  • I assume that your source DataFrame has columns code and quantity, and quantity is of type object (has both strings and numbers).

    Then, one of possible solutions is as follows:

    • df.quantity.astype(str) - Convert quantity values to strings (np.nan values are converted to 'nan' string).
    • .str.isdigit() - Check whether the above string has only digit chars.
    • [...] - The above booolean list selects which rows to select (from df).
    • .groupby('code').sum() - Having such a "limited" table, you can perform all processing with a single groupby (by code) and sum for each group.

    To sum up:

    df[df.quantity.astype(str).str.isdigit()].groupby('code').sum()