Search code examples
pythonpandasdataframeseriesis-empty

how to replace empty series values with NaN in python


I am iterating over a number of columns and storing their summary statistics like mean, median, skewness and kurtosis in a dict as below:

metrics_dict['skewness'] = data_col.skew().values[0]
metrics_dict['kurtosis'] = data_col.kurt().values[0]
metrics_dict['mean'] = np.mean(data_col)[0]
metrics_dict['median'] = np.median(data_col)

However for some columns, it gives error as below:

IndexError: index out of bounds

The column in question is below:

Index          device
61021           C:2
61022          D:3+
61023          D:3+
61024           B:1
61025          D:3+
61026           C:2 

I simply want to append NA to the dict in case of such a column and not have the error interrupt my loop. Here index is just the index of the dataframe and the column under operation is device. Please note that the data has a large num of numeric columns ( ~ 500) where 2 -3 columns are like device and hence I need to just add NA to the dict for these and move on to the next column. Can someone please tell me how to do that in python?


Solution

  • Since these statistics are only meaningful for numeric columns, you can try isolating numeric columns. This is possible using pd.DataFrame.select_dtypes:

    numerics = ['int16', 'int32', 'int64', 'float16', 'float32', 'float64']
    
    numeric_cols = df.select_dtypes(include=numerics).columns
    
    for col in df:
        if col in numeric_cols:
            # calculate & add some values to dictionary
        else:
            # add NA values to dictionary