Search code examples
rhistogramtypechecking

R Type Checking


I have yet another R question. I am trying to do some type checking, but cannot exactly figure out what I am doing wrong.

I am trying to create a histogram for each level of y. So for instance, I want to create overlaying histograms of the iris data species and their sepal widths, etc

Here is what I have thus far:

    #if x isn't numeric
    if(!is.numeric(x)){
    #if y isn't a factor
    }else if(!is.factor(y)){
    #if the length of x isn't equal to y
    }else if(nChar(x) != nChar(y)){
    #error message
    stop('x is not numeric/y is not a factor/both x and y are the same                  length')
}
#otherwise create histogram
#testing with iris data set
hist(y, main = "Iris Species", xlab = "Sepal Width", col = "orange", border   ="blue")

Solution

  • I usually use stopifnot() for this, so that you check the simplest condition first then proceed to the more complex; you don't want to test all of them at once if the first one is invalid:

    stopifnot(is.numeric(x))
    stopifnot(is.factor(y))
    stopifnot(length(x) == length(y))
    

    Alternatively, doing all of this in one go:

    if(!(is.numeric(x) && is.factor(y) && length(x)==length(y))){
        stop("your error message")
    }
    

    Now it's not clear to me why you're testing y here at all, as there is no 'y' argument to hist(). Perhaps you were planning to plot separate histograms for x for each level of y?

    If so, you should be able to adapt the following:

    x <- iris$Sepal.Width
    y <- iris$Species
    l1 <- length(levels(y))
    ## temporarily change plotting parameters
    op <- par(mfrow = c(1, l1))
    for (i in 1:l1){
        hist(x[y == levels(y)[i]],
             main=paste0("Iris Species: ", levels(y)[i]),
             xlab = "Sepal Width",
             col="orange",
             border="blue")
    }
    par(op)
    

    giving:

    enter image description here

    I am not aware of an nChar function in R; length() is normally used for this.

    Here's the overlapping approach. Note that for loops are often easier to read than apply and the loss in speed is likely to be relatively small.

    for (i in 1:l1){
        hist(x[y == levels(y)[i]],
             ## main=paste0("Iris Species: ", levels(y)[i]),
             main="Iris Species: ",
             xlab = "Sepal Width",
             col=i+1,
             add=!(i==1))
    }
    legend(x=4, y=25, legend=levels(y), fill=1+(1:l1))
    

    giving:

    enter image description here