Search code examples
python-3.xvectorizationcategorical-data

Response coding for categorical data


Response coding is a technique to vectorize categorical data. Let us say, we have a categorical feature named 'grade_category' which has the following unique labels - ['grades_3_5', 'grades_prek_2', 'grades_9_12', 'grades_6_8']. Assume that we are working on a classification problem with target class-labels as 0 and 1

In response-coding, you have to output probability values for each label in our feature that the label occurs with a particular class-label E.g, grades_prek_2 = [probability it occurs with class_0, probability it occurs with class 1]


Solution

  • def response_coding(xtrain, ytrain, feature):
                """ this method will encode the categorical features 
                using response_coding technique. 
                args:
                    xtrain, ytrain, feature (all are ndarray)
                returns:
                    dictionary (dict)
                """
        
        dictionary = dict()
        x = PrettyTable()
        x = PrettyTable([feature, 'class 1', 'class 0'])
    
        unique_cat_labels = xtrain[feature].unique()
    
        for i in tqdm(range(len(unique_cat_labels))):
            total_count = xtrain.loc[:,feature][(xtrain[feature] == unique_cat_labels[i])].count()
            p_0 = xtrain.loc[:, feature][((xtrain[feature] == unique_cat_labels[i]) & (ytrain==0))].count()
            p_1 = xtrain.loc[:, feature][((xtrain[feature] == unique_cat_labels[i]) & (ytrain==1))].count()
    
            dictionary[unique_cat_labels[i]] = [p_1/total_count, p_0/total_count]
    
            row = []
            row.append(unique_cat_labels[i])
            row.append(p_1/total_count)
            row.append(p_0/total_count)
            x.add_row(row)
        print()
        print(x)[![enter image description here][1]][1]
        return dictionary
    

    enter image description here