Search code examples
machine-learninglstmone-hot-encoding

Encoding two categorial data present in same dataset in Deep learning


I have a dataset with columns reason and issue.

I wanted to encode it as:

enc = OneHotEncoder()
reason_no_enc = enc.fit_transform(temp['REASON NO'].values.reshape(-1, 1)).toarray()
issue_enc = enc.fit_transform(temp['Issue'].values.reshape(-1, 1)).toarray()

But I realized it is creating problem, the later one issue_enc is considered encoded, when I try to inverse reason_no_enc, it generates an error.

How to handle it?


Solution

  • You have to use different instances of OHE for each column like this:

    # fit encoder using 'REASON NO' data
    # later use this instance of OHE to decode 'REASON NO' data
    ohe_reason = OneHotEncoder()
    reason_no_enc = ohe_reason.fit_transform(temp['REASON NO'].values.reshape(-1, 1)).toarray()
    
    # fit encoder using 'Issue' data
    # later use this instance of OHE to decode 'Issue' data
    ohe_issue = OneHotEncoder()
    issue_enc = ohe_issue.fit_transform(temp['Issue'].values.reshape(-1, 1)).toarray()
    

    And also you can use one instance of OHE for both categories like this:

    enc = OneHotEncoder()
    encoded_arr = enc.fit_transform(temp[['REASON NO', 'Issue']])