Search code examples
pythondataframeclassificationprediction

consider two columns when analysis


I have a dataframe with rating, price, and currency. three columns for example:

df = pd.DataFrame({
     'ratingvalue': ['5.0', '4.5', '2.0'],
     'pricerange': ['10000000', '899', '200'],
    'pricecurrency': ['45', '15',  '20']
})
#the number of pricecurrency represent the currency like EUR, USD,

I'm working on the prediction model that could predict the rating, and we all know that when consider prices, we have to take the currency into account.

How can I take two columns as an independent variable when creating the classifier


Solution

  • I feel like it lacks a little bit of detail, because a simple classifier like LinearRegression can of course take into account multiple features. So it's just a matter of passing both features into the model.

    In case your question was about how to make those features more useful, i'd suggest the following:

    1. If you have limited amount of currencies, just hardcode the conversion rate and convert all of them to a single currency.
    2. If the amount is big or you are not sure which one is which, you could just use them as features, but then I'd suggest encoding them as one-hot features, since the ordering might influence the predictions. For example use OneHotEncoder to generate the features

    I again don't know many details of the task, but if it's not simple, you'd need more features then that to get a good prediction, so maybe consider adding some other columns as well.