Search code examples
rmachine-learningnaivebayeslog-likelihood

Sweep a log-dnorm across a training set matrix to find log-likelihood


As part of a machine learning class assignment, I am implementing a NaiveBayes classifier without using any external library.

My training data set X has 8 features and one binary label for 800 rows; I have calculated 1:8 vectors for mean and sd for each feature by class, along with the priors for the two classes.

In order to assess accuracy of the classifier on the training dataset, I want to generate a matrix Y with the same dimensions (i=800, j=8) in which each element y_ij is given as

y_ij = dnorm(x_ij, mean = mean_j, sd_j, log = T)

I have tried sweep, apply, and lapply without success. I am stuck and unfortunately this is an issue with familiarity with R rather than understanding the algo. Help is greatly appreciated.


Solution

  • There's probably a better data setup for this, but if you already have X and two vectors of means and sds, xmean and xsd, you can use sapply. Here's a reproducible example:

    X <- matrix(rnorm(30), 10, 3)
    xmean <- apply(X, 2, mean)
    xsd <- apply(X, 2, sd)
    sapply(1:ncol(X), function(j) { dnorm(X[,j], xmean[j], xsd[j], log = TRUE) })
    

    🐙