Search code examples
pythonscikit-learnone-hot-encoding

one hot encoder: how to encode multible value of same category?


I am going to predict the box office of a movie. Assuming that there is only one categorical feature "actors" with values "A","B","C".And I enocde them as [1,0,0],[0,1,0],[0,0,1],what if the movie has multiple actors,for example both A and B, should I encode it as [1,1,0] or [1,0,0,0,1,0]


Solution

  • you should represent each as integers and or them together

    A=int("100",2)
    B=int("010",2)
    C=int("001",2)
    print A,B,C
    movie = A|B
    print movie
    print bin(movie)