My dataframe is similar to the table below. I have 6 columns, each with 'Yes' or 'No' if a specific antibiotic was given.
AZITH | CLIN | CFTX | METRO | CFTN | DOXY | TREATED |
---|---|---|---|---|---|---|
Yes | Yes | No | No | No | No | |
No | Yes | No | Yes | No | No | |
Yes | Yes | No | No | No | No | |
No | No | No | No | No | No | |
Yes | Yes | Yes | Yes | Yes | Yes | |
No | Yes | Yes | Yes | No | Yes | |
No | No | No | No | No | Yes | |
No | No | No | No | No | No | |
Yes | Yes | Yes | No | No | No | |
Yes | No | Yes | Yes | No | No | |
Yes | No | No | No | Yes | No | |
No | No | Yes | Yes | No | Yes | |
No | No | No | No | Yes | Yes | |
No | No | No | Yes | No | Yes |
I want to fill the column 'TREATED' with 'True' if specific combinations of antibiotic columns contain 'Yes.' If the conditions aren't met, then I would like to fill the 'TREATED' column with a 'False' value.
If ['AZITH'] & ['CLIN'] == 'Yes' |
['AZITH'] & ['CFTX'] & ['CLIN'] == 'Yes' |
['AZITH'] & ['CFTX'] & ['METRO']== 'Yes' |
['AZITH'] & ['CFTN'] == 'Yes' |
['CFTX'] & ['DOXY'] & ['METRO']== 'Yes' |
['CFTN'] & ['DOXY'] == 'Yes' |
['DOXY'] & ['METRO']== 'Yes' ,
Then return 'True' in column 'TREATED'
Else 'False'
What I had in mind was some sort of if statement or use of lambda function, however, I am having trouble.
This must not be exclusive to the above combinations but also include for example if all 6 medications were given. If that's the case, then 'True' should be returned because the condition has been met to give at least 2 of the treatment medications.
The desired output is below:
AZITH | CLIN | CFTX | METRO | CFTN | DOXY | TREATED |
---|---|---|---|---|---|---|
Yes | Yes | No | No | No | No | Yes |
No | Yes | No | Yes | No | No | No |
Yes | Yes | No | No | No | No | Yes |
No | No | No | No | No | No | No |
Yes | Yes | Yes | Yes | Yes | Yes | Yes |
No | Yes | Yes | Yes | No | Yes | Yes |
No | No | No | No | No | Yes | No |
No | No | No | No | No | No | No |
Yes | Yes | Yes | No | No | No | Yes |
Yes | No | Yes | Yes | No | No | Yes |
Yes | No | No | No | Yes | No | Yes |
No | No | Yes | Yes | No | Yes | Yes |
No | No | No | No | Yes | Yes | Yes |
No | No | No | Yes | No | Yes | Yes |
With the dataframe you provided:
import pandas as pd
df = pd.DataFrame(
{
"AZITH": [
"Yes",
"No",
"Yes",
"No",
"Yes",
"No",
"No",
"No",
"Yes",
"Yes",
"Yes",
"No",
"No",
"No",
],
"CLIN": [
"Yes",
"Yes",
"Yes",
"No",
"Yes",
"Yes",
"No",
"No",
"Yes",
"No",
"No",
"No",
"No",
"No",
],
"CFTX": [
"No",
"No",
"No",
"No",
"Yes",
"Yes",
"No",
"No",
"Yes",
"Yes",
"No",
"Yes",
"No",
"No",
],
"METRO": [
"No",
"Yes",
"No",
"No",
"Yes",
"Yes",
"No",
"No",
"No",
"Yes",
"No",
"Yes",
"No",
"Yes",
],
"CFTN": [
"No",
"No",
"No",
"No",
"Yes",
"No",
"No",
"No",
"No",
"No",
"Yes",
"No",
"Yes",
"No",
],
"DOXY": [
"No",
"No",
"No",
"No",
"Yes",
"Yes",
"Yes",
"No",
"No",
"No",
"No",
"Yes",
"Yes",
"Yes",
],
}
)
Here is one way to do it:
mask = (
((df["AZITH"] == "Yes") & (df["CLIN"] == "Yes"))
| ((df["AZITH"] == "Yes") & (df["CLIN"] == "Yes") & (df["CFTX"] == "Yes"))
| ((df["AZITH"] == "Yes") & (df["CFTX"] == "Yes") & (df["METRO"] == "Yes"))
| ((df["AZITH"] == "Yes") & (df["CFTN"] == "Yes"))
| ((df["CFTX"] == "Yes") & (df["DOXY"] == "Yes") & (df["METRO"] == "Yes"))
| ((df["CFTN"] == "Yes") & (df["DOXY"] == "Yes"))
| ((df["DOXY"] == "Yes") & (df["METRO"] == "Yes"))
)
df.loc[mask, "TREATED"] = "Yes"
df = df.fillna("No")
Then:
print(df)
# Output
AZITH CLIN CFTX METRO CFTN DOXY TREATED
0 Yes Yes No No No No Yes
1 No Yes No Yes No No No
2 Yes Yes No No No No Yes
3 No No No No No No No
4 Yes Yes Yes Yes Yes Yes Yes
5 No Yes Yes Yes No Yes Yes
6 No No No No No Yes No
7 No No No No No No No
8 Yes Yes Yes No No No Yes
9 Yes No Yes Yes No No Yes
10 Yes No No No Yes No Yes
11 No No Yes Yes No Yes Yes
12 No No No No Yes Yes Yes
13 No No No Yes No Yes Yes