Search code examples
pythonpandasdataframecol

Pandas dataframe - Sum a column wrt to values in another column


I have a data that looks like this :-

data = {"doc1" : {'a': 2 , 'b': 1,'c':3}, "doc2" :  {'a': 1 , 'b': 1,'c':3}, "doc3" : {'a': 1 , 'b': 1,'c':3}}

I convert it into a dataframe :-

df = pd.DataFrame.from_dict(data,orient='index')

Dataframe looks like this :-

a c b doc1 2 3 1 doc2 1 3 1 doc3 1 3 1

Now I want to sum all the values in column b where column a values is 1.

So the value I want will be 2.

Is there an easy way to do this rather than iterating through both the columns ? I checked other posts and found this :-

This makes use of .loc function. df.loc[df['a'] == 1, 'b'].sum()

But for some reason, I can't seem to make it to work with my dataframe.

Please let me know.

Thanks.


Solution

  • You are very close. See below.

    >>> df[df['a'] == 1]['b'].sum()
    2
    

    Instead of using .loc, try just filtering the dataframe first (df[df['a'] == 1]), then selecting the column 'b', and then summing.

    Edit: I'll leave this here for future reference, although depending on the version of pandas you're using, your solution should work (thanks, @maxymoo). I'm running 0.18.1 and both approaches worked.