Search code examples
pythonpandasdataframenandictionary-comprehension

why having std for 1 column and others are nan?


i have DataFrame looks something like this but with shape (345,5) like this

|something1|  something2|  numbers1| number2 |number3|
|----------|------------|----------|---------|-------|
| A        | str        |    45    | nan     |nan    |
|B         | str2       |   6      |  nan    | nan   |
| c        | str3       |   34     |  67     | 45    |
|D         | str4       |    56    |  45     | 23    |

and i want to get the std for the numeric columns ONLY with my manually std function and save in dictionary, the probelm is i am getting this result for the first column only:

{'number1': 18.59267328815305,
 'number2': nan,
 'number3': nan,
 'number4': nan}

and here is my code:

std = {column:std_func(df[column].values) for column in df.columns}

Solution

  • Pandas can handle this, try instead

    df[['numbers1', 'numbers2', 'numbers3']].std()
    

    by default NaNs are skipped: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.std.html

    if you want this in a dict then do:

    df[['numbers1', 'numbers2', 'numbers3']].std().to_dict()
    

    edit: if you are dead-set on using specifically your custom standard deviation function, just dropna from the column before applying:

    std = {column:std_func(df[column].dropna().values) for column in df.columns}
    

    https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.dropna.html