I have a Pandas crosstab created from NumPy arrays of zeros and ones. After creating crosstab by:ct = pd.crosstab(index=np.array(labels), columns=np.array(cols)
I have a crosstab that looks like:
col_0 0 1
row_0
---------------------
0 618 53
1 53 79
I wanted to add a column in here for error percentages on false positives and false negatives. I added a new column to the crosstab using ct[2] = ct.apply(lambda row: row[0] + row[1], axis = 1)
which added a column of sum of the first two columns. I would like to add a fourth column that would calculate ct[0][0] divided by the sum for the first row, and ct[1][1] divided by the sum for the second row, and add that column to ct[3]. I tried:
for i,y in ct.iterrows():
ct[3] = ct.apply(lambda y: y[i]/y[2] , axis = 1)
but since it executes twice, it will overwrite the column with the 2nd result. I tried using ct[3][i]
to no avail. Using lambda y: y[1] if y==0 else y[0]
doesn't work. What is the method for adding a new column to the dataframe / crosstab, with a condition-based calculation on existing values?
Try this using numpy and pandas:
ct[3] = (ct * np.eye(2)).sum() / ct.sum(1)
or
ct[3] = np.diag(ct) / ct.sum(1)
Output:
col_0 0 1 3
row_0
0 618 53 0.921013
1 53 79 0.598485