Search code examples
rdataframeboxplotr-factor

How can I change a dataframe of factors so that the dataframe can be boxplotted?


I have a dataframe of which the columns contain a variable amount of numbers and a variable amount of NA's. The dataframe looks like this:

    V1 V2 V3 V4 V5 V6
1    0 11  4  0  0 10
2    0 17  3  0  2  2
3   NA  0  4  0  1  9
4   NA 12 NA  1  1  0
<snip>
743 NA NA NA NA  8 NA
744 NA NA NA NA  0 NA

I want to make a boxplot out of this, but when I do

boxplot(dataframe)

I get the error

adding class "factor" to an invalid object

When I do

lapply(dataframe,class)

I get the folowing output:

$V1
[1] "factor"
$V2
[1] "factor"
<snip>
$V6
[1] "factor"

So how can I change my dataframe so that the columns are seen as numeric?


Solution

  • You want to apply as.numeric(as.character(...)) to each factor column. The code below shows how this can be done affecting only the factor variables leaving the numeric types alone.

    ## dummy data
    df <- data.frame(V1 = factor(sample(1:5, 10, rep = TRUE)),
                     V2 = factor(sample(99:101, 10, rep = TRUE)),
                     V3 = factor(sample(1:2, 10, rep = TRUE)),
                     V4 = 1:10)
    
    df2 <- data.frame(sapply(df, function(x) { if(is.factor(x)) {
                                                  as.numeric(as.character(x))
                                               } else {
                                                  x
                                               }
                                             }))
    

    This gives:

    > df2
       V1  V2 V3 V4
    1   4 101  2  1
    2   1 100  1  2
    3   5  99  2  3
    4   4  99  2  4
    5   2 100  1  5
    6   2 100  2  6
    7   2 101  2  7
    8   4 100  1  8
    9   2 101  2  9
    10  4 101  1 10
    > str(df2)
    'data.frame':   10 obs. of  4 variables:
     $ V1: num  4 1 5 4 2 2 2 4 2 4
     $ V2: num  101 100 99 99 100 100 101 100 101 101
     $ V3: num  2 1 2 2 1 2 2 1 2 1
     $ V4: num  1 2 3 4 5 6 7 8 9 10