Search code examples
pythonpandaseda

i am trying to access a column in a dataframe and manipulate it and create new column in the data ftame


x = onefile1['quiz1']
grading = []
for i in x :
    if i == '-':
        grading.append(0)

    elif float(i) < float(50.0):
        grading.append('lessthen50')

    elif i > 50.0 and i < 60.0:
        grading.append('between50to60')

    elif i > 60.0 and i < 70.0:
        grading.append('between60to70')


    elif i > 70.0 and i < 80.0:
        grading.append('between70to80')

    elif i  > 80.0:
        grading.append('morethen80')

    else:
        grading.append(0) 

onefile1 = file.reset_index()
onefile1['grade'] = grading

It is giving me the following error :

Length of values does not match length of inde


Solution

  • You probably have a value equal to 50, 60 or 70 etc. You can use <= instead of < or cut from pandas,

    import numpy as np
    import pandas as pd
    
    onefile1['quiz1'] = (onefile1['quiz1']
                            .astype(str).str.replace('-', '0')
                            .astype(float))
    
    labels = [
        0, 'lessthen50', 'between50to60', 
        'between60to70', 'between70to80', 'morethen80'
    ]
    
    bins = [-1, 0, 50, 60, 70, 80, np.inf]
    onefile1['grade'] = pd.cut(
        onefile1.quiz1, bins=bins, 
        labels=labels, include_lowest=True)
    

    Here is an example,

    >>> import numpy as np
    >>> import pandas as pd
    >>> onefile1 = pd.DataFrame({'quiz1': [0, 40, 30, 60, 80, 100, '-']})
    >>> onefile1['quiz1'] = (onefile1['quiz1']
                            .astype(str).str.replace('-', '0')
                            .astype(float))
    >>> labels = [
        0, 'lessthen50', 'between50to60',
        'between60to70', 'between70to80', 'morethen80'
    ]
    >>> bins = [-1, 0, 50, 60, 70, 80, np.inf]
    >>> onefile1['grade'] = pd.cut(
        onefile1.quiz1, bins=bins,
        labels=labels, include_lowest=True)
    >>> onefile1
       quiz1          grade
    0    0.0              0
    1   40.0     lessthen50
    2   30.0     lessthen50
    3   60.0  between50to60
    4   80.0  between70to80
    5  100.0     morethen80
    6    0.0              0
    

    PS: It is a good idea to check the parameters include_lowest and right before use.