Search code examples
rstatistics

Error in finding quartiles and outliers in R Studio


So I am doing a university project and I am trying to find outliers in a dataset with 722 rows and 9 columns so there are over 6000 variables.

I have been trying many ways:

Firstly with Z scores where this happened.

z_scores<-(concrete_strength_test-mean(concrete_strength_test))/sd(concrete_strength_test)
Error in is.data.frame(x) : 
  'list' object cannot be coerced to type 'double'
In addition: Warning message:
In mean.default(concrete_strength_test) :
  argument is not numeric or logical: returning NA

and secondly quantiles:

Q1 <- quantile(concrete_strength_test, .25)
Error in quantile.default(concrete_strength_test, 0.25) : 
  missing values and NaN's not allowed if 'na.rm' is FALSE

as there were missing or NA values I ran the code with the mice function and got rid of them but then I got this error:

Q1 <- quantile(concrete_strength_test_imputed, .25)
Error in xtfrm.data.frame(x) : cannot xtfrm data frames

I'm just not sure where to go next so any help would be greatly appreciated.


Solution

  • If I understand you correctly, you want to treat the 9 columns as one big bag of values and not find the quantiles and outliers in each column?

    If this is the case, the data structure of concrete_strength_test is wrong. You have it as a data.frame/list but need it as a vector. To do this, convert it first. Then you can do all kinds of analyses:

    values = c(as.matrix(concrete_strength_test))
    
    z_scores = (values-mean(values))/sd(values)
    q1 = quantile(values,c(.25),na.rm=T)