Search code examples
pythonpandasdataframedata-preprocessing

is there a way, to count all rows which contain at least one '1' in a dataframe checking multiple named columns?


I have a dataset filled with Medicare beneficiaries. The question is: 'What proportion of patients have at least one of the chronic conditions described in the independent variables alzheimers, arthritis, cancer, copd, depression, diabetes, heart.failure, ihd, kidney, osteoporosis, and stroke?'

I tried creating a subset and using isnull() & any(), but i can´t get a proper solution.. also tried df.loc but it only lets me name one column..

i am attaching the dataset for better understanding!

https://drive.google.com/file/d/1R--YEsBCDHMXjqNzAumT2zzUAYvM1bWA/view?usp=sharing

Thanks!!

My Try´s:

claimss.loc[:, ["alzheimers","diabetes","arthritis"] == 1]

(Wanted to try it first with 3 columns, doesn´t work in the first place..)

Try with Subset:

filtered_df = df.loc[raw_df] == 1]

(Created a Subset where only index and independet variables(diseases) appear and tried to look for null-rows)


Solution

  • If need filter only some columns names use subset for filter columns names, compare by 1 by DataFrame.eq and last test at least one True by DataFrame.any:

    claimss[claimss[["alzheimers","diabetes","arthritis"]].eq(1).any(axis=1)]
    

    If need percentage use mean with boolean mask:

    out = claimss[["alzheimers","diabetes","arthritis"]].eq(1).any(axis=1).mean()