Search code examples
rstatisticsregularized

Trying to perform LDA with LASSO


PenalizedLDA( x = train_x, y =train_y) returns

Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic

I'm trying to use linear discriminant analysis with lasso on the sampbase dataset from UCI.(I've added the headers to the columns and where appropriate return the columns to an interval [0,1].

The first time I ran the code it gave an error

Error in PenalizedLDA(x = train_x, y = train_y) : y must be a numeric vector, with values as follows: 1, 2, ....

I solved that by passing train_y as

train_y =as.list.numeric_version(training_set[,58])

When I ran it again it I got the error

Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 'x' must be atomic

Here I got stuck.

library(penalizedLDA)
data = read.csv("spambase.csv",header = TRUE)

new_data = data/100
new_data[,c(55,56,57,58)] = data[,c(55,56,57,58)]
new_data[,58]= factor(new_data[,58])

# Splitting dataset into Training set and Test set


set.seed(seeds)
split = sample.split(new_data$factor, SplitRatio = 0.7)
training_set = subset(new_data, split == TRUE)
test_set = subset(new_data, split == FALSE)

#scale data

training_set[-58] = scale(training_set[,-58])
test_set[-58] = scale(test_set[,-58])

train_x =training_set[,-58]
train_y =as.list.numeric_version(training_set[,58])
#Sparse linear discriminant Analysis
classifier = PenalizedLDA( x = training_set[,-58], y =training_set[,58],K = 1,lambda = "standard")

Solution

  • According to the help-page of PenalizedLDA(), its parameter y = should be:

    A n-vector containing the class labels. Should be coded as 1, 2, . . . , nclasses, where nclasses is the number of classes.

    It means that the levels of the variable of interest (position 58 in your case) should start be one and not 0. Moreover, don't use the function as.list.numeric_version(), because it creates a list, whereas a vector is required.

    data = read.csv("...")
    
    new_data = data/100
    new_data[,c(55,56,57,58)] = data[,c(55,56,57,58)]
    new_data[,58] = factor(new_data[,58] + 1)  # in order to start at 1 and not 0
    new_data[-58] = scale(new_data[,-58])
    
    classifier = PenalizedLDA(x = new_data[,-58], y = new_data[,58], K = 1, lambda = .1)