Search code examples
rmedian

How to choose a range of columns in R


I have some data and just want to calculate mean, sd, var and so on. My problem are not the functions but the columns, I just can't seem to figure out how to choose them.

So the first column includes the names of the animals and column 2 to 11 my numeric data. Column names are X1 to X10. I have lots of NA in my data.

I can easily calculate it for each row but when I combine them I always get

Argument is not numeric or logical: returning NA

So for example for mean and one column I tried (+ it worked)

mean(WLD1$X1, na.rm=TRUE)

for column 2 to 11 I tried:

mean(WLD1[,c(2:11)], na.rm=TRUE)

also tried:

lapply(WLD1[,2:11], mean, na.rm=TRUE)

Also tried it with X1:X10.
I guess it's pretty simple but I'm just stuck on it. Really thankful for any help.


Solution

  • You may want to use apply function. What the apply function does is takes a function (required computation) and applies to each element either column wise or row wise for a DataFrame or a matrix. The row wise and column wise settings are encoded by the MARGIN= parameter and the actual computation that you want to do is encoded by FUN= (which stands for function obviously). So if you want to feed your one row at a time inside the intended function/computation then you will choose MARGIN=1 otherwise you will choose MARGIN=2 (which means one column at a time will be fed into the function). Since in your case you want to compute the mean, sd and var for column numbers 2 to 11, you will do it in three steps and you are right we will have MARGIN=2 for all the three statments but FUN= will keep changing. Below is the code.

    Mean_of_2_to_11_Column <- apply(WLD1[,2:11], MARGIN=2, FUN=mean)
    SD_of_2_to_11_Column <- apply(WLD1[,2:11], MARGIN=2, FUN=sd)
    Var_of_2_to_11_Column <- apply(WLD1[,2:11], MARGIN=2, FUN=var)
    

    Let me know if any thing here I said is not clear to you. All the best