I am trying to implement a Naive Bayes model in R based on known information:
Age group, e.g. "18-24" and "25-34", etc.
Gender, "male" and "female"
Region, "London" and "Wales", etc.
Income, "£10,000 - £15,000", etc.
Job, "Full Time" and "Part Time", etc.
I am experiencing errors when implementing. My code is as per below:
library(readxl)
iphone <- read_excel("~/Documents/iPhone_1k.xlsx")
View(iphone)
summary(iphone)
iphone
library(caTools)
library(e1071)
set.seed(101)
sample = sample.split(iphone$Gender, SplitRatio = .7)
train = subset(iphone, sample == TRUE)
test = subset(iphone, sample == FALSE)
nB_model <- naiveBayes(Gender ~ Region + Retailer, data = train)
pred <- predict(nB_model, test, type="raw")
In the above scenario, I have an excel file called iPhone_1k (1,000 entries relating to people who have visited a website to buy an iPhone). Each row is a person visiting the website and the above demographics are known.
I have been trying to make the model work and have resorted to following the below link that uses only two variables (I would like to use a minimum of 4 but introduce more, if possible):
https://rpubs.com/dvorakt/144238
I want to be able to use these demographics to predict which retailer they will go to (also known for each instance in the iPhone_1k file). There are only 3 options. Can you please advise how to complete this?
P.S. Below is a screenshot of a simplified version of the data I have used to keep it simple in R. Once I get some code to work, I'll expand the number of variables and entries.
You are setting the problem incorrectly. It should be:
naiveBayes(Retailer ~ Gender + Region + AgeGroup, data = train)
or in short
naiveBayes(Retailer ~ ., data = train)
Also you might need to convert the columns into factors if they are characters. You can do it for all columns, right after reading from excel, by
iphone[] <- lapply(iphone, factor)
Note that if you add numeric variables in the future, you should not apply this step on them.