Search code examples
pythonpandaslist-comprehension

List comprehension with multiple conditions on different columns


I have the following df,

data = [['Male', 'Agree'], ['Male', 'Agree'], ['Male', 'Disagree'], ['Female','Neutral']]
 
df = pd.DataFrame(data, columns = ['Sex', 'Opinion'])
df

& would like to get the total number of Male who either Agree or Disagree. I expect the answer to be 3 but instead get 9.

sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ] 

I have done this through other methods and I'm trying to understand list comprehension better.


Solution

  • Let's unpack this a bit. The original statement

    total = sum([True for x in df['Opinion'] for y in df['Sex'] if x in ['Agree','Disagree'] if y=='Male' ]
    

    is equivalent to

    total = 0
    for x in df['Opinion']:
        for y in df['Sex']:
            if x in ['Agree', 'Disagree']:
                if y=='Male':
                    total += 1
    

    I think it should be clear in this case why you get 9.

    What you actually want is to only consider corresponding pairs of two equal sized iterables. There's the handy zip built-in in python which does just this,

    total = 0
    for x,y in zip(df['Opinion'], df['Sex']):
        if x in ['Agree', 'Disagree'] and y=='Male':
            total += 1
    

    or as a comprehension

    total = sum(1 for x,y in zip(df['Opinion'], df['Sex']) if x in ['Agree', 'Disagree'] and y=='Male')