Search code examples
python-3.xdataframemachine-learningkaggle

How to iterate over a DataFrame for a selected coulmn using python?


suppose i have a sample code like this

_d=pd.DataFrame([[1,2,3],[4,np.nan,6],[np.nan,np.nan,8]],columns=['x','y','z'])

now, i have a function which checks for the value and assigns a desired value according to the scenerio

def handling_nan(_d):
    if _d['x']==1.0:
        return 100
    else:
        return _d

when i use this, in my below code,

_result=_d.apply(lambda x:handling_nan(x))
_result

i am getting error

KeyError: ('x', 'occurred at index x')

UPDATE A :

well, in short, i am using the dataset from kaggle.com ie. Titanic: Machine Learning from Disaster and in that dataset, i want to introduce a new column with condition something like this.

if male and the age is NaN then insert the mean() age of men instead of NaN and if female and the age is NaN, then insert the mean() of the total female age instead of NaN


Solution

  • KeyError is encountered in the function since apply() method on a dataframe assumes axis=0. This means that the function will be applied on every column and not every row. To remove this error, the apply() call needs to be replaced as:

    _result=_d.apply(lambda x:handling_nan(x), axis=1)
    

    Looking at the edit, the question is to replace NaNs with grouped means in the dataset.

    This can be done using fillna() and transform() method as following:

    
    l = [["M", 30], ["M", 45], ["M", None], ["F", 76], ["F", 23], ["F", None]]
    df = pd.DataFrame(l, columns=["sex", "age"])
    df['age'] = df['age'].fillna(df.groupby("sex")['age'].transform('mean'))
    
    

    This answer has other alternative solutions.

    Hope this helps.