Search code examples
machine-learningfactors

Handling ID variables and Factors in R


I have this dataset and I want to build some models and compare them.

However I'm quite confused of how should the product ID independent variable.

I have this dataset, all variables are numeric, but the product ID variable is int as shown below:

Data set


str(data)

'data.frame':   16 obs. of  6 variables:                                                                       
 $ Productid: int  1 2 3 4 5 6 7 8 9 10 ...                                                                   
              $ x1       : num  6.21 7.75 7.21 8.33 4.87 5.09 6.04 6.09 6.08 6.17 ...                                             
              $ x2       : num  7.08 3.29 4.38 2.79 7.71 7.5 6.58 5.13 5.5 5.58 ...                                               
              $ x3       : num  2 1.54 1.79 1.63 1.96 2.13 2.04 2 2.09 2.13 ...                                        
              $ x4       : num  2.54 2.26 2.58 2.71 1.7 2.42 2.04 2.42 2.46 2.48 ...                                         
              $ Y        : num  4.97 6.98 4.58 6.45 4.33 4.26 6.16 6.26 5.83 5.74 ...                                        

How to handle this product ID? should I do one - hot - encoding ?

And if the solution is to transform it into a factor, what ML algorithm accepts factors ?


Solution

  • ID is there just for identification of a product but doesn't have any impact on dependent variable therefore it should not be included in any model.