Search code examples
excelcsvprobabilitybayesian

How to select certain rows in Excel that meet logical criteria of A & B


I have an excel sheet in CSV that has 8 columns A-H and thousands of rows with values 0 or 1 depending on truth value.

I'm looking for the Excel function in which I can select rows where column A and B are true so that I can check another columns probability given A&B. IE P((A&B)|D) (where | means given).

I'm really new to excel and having difficulties finding how to only select rows that meet this criteria.


Solution

  • The following formula entered in I1 will return a 1 if both A1 and B1 are true.

    =IF(AND($A1=1,$B1=1),1,0)
    

    Copy it down or autofill to identify all rows where A and B are true.

    The $ sign before A and B make the column references absolute meaning if you drag the formula to the right, the references to columns A and B will remain.

    Because Excel implicitly interprets 0 = FALSE and 1 (or any other number) = TRUE the formula could be shortened to:

    =IF(AND($A1,$B1),1,0)
    

    The probability of C being 1 given that A and B are 1 can be calculated by counting all rows where A, B and C are all 1 and dividing by the number of rows where both A and B are 1:

    =COUNTIFS($A:$A,"1",$B:$B,"1",C:C,"1")/COUNTIFS($A:$A,"1",$B:$B,"1")
    

    Again, references to A and B are absolute, while C is relative so you can drag right to get probabilities for columns D to H.

    COUNTIFS only counts the rows where all of the criteria are met and allows you to specify up to 127 range/criteria pairs.

    EDIT You could also use:

    =AVERAGEIFS(C:C,$A:$A,1,$B:$B,1)
    

    to get the probability.