Search code examples
rcorrelationstat

How to inform R that the first column of my dataset is row names? And how should change the class of data frame to vector or matrice?


   APD<- read.csv("APD.csv",header=FALSE)
  rar<- read.csv("rar.csv",header=FALSE)
 ## Making column name labels
 tree<-rep("tree",100)
 tree_labels<-c(1:100)

colnames(APD)<-c(paste(tree,tree_labels, sep="")) 
 colnames(rar)<-c(paste(tree,tree_labels, sep=""))

  correlation.csv<- cor(x=APD, y=rar, method = "spearman")

The above script suppose two calculated correlation between columns of two data sets . But there are two problem it starts labeling the output from the first column (which is the row name) so the last score gets NA as label. I'm not sure if I'm thinking correctly or not but maybe for the same reason R thinks APD is a data frame so does not calculate the last line.

Cheers

subset of csv file

   V1              V2                V3    V4
t1  9.368703877 9.693286792 12.44129352 13.06908296
t10 8.128921254 8.940227268 11.40226603 12.17704779
t11 7.87062995  8.697508965 11.39250803 12.17704779

After read it in it's like below

    V2           V3            V4            V5
    V1          V2               V3         V4
 t1 9.368703877 9.693286792 12.44129352 13.06908296
t1  08.128921254    8.940227268 11.40226603 12.17704779
t11 7.87062995  8.697508965 11.39250803 12.17704779

Solution

  • For the first problem, you can use row.names:

    APD <- read.csv("APD.csv", header = FALSE, row.names = 1)
    rar <- read.csv("rar.csv", header = FALSE, row.names = 1)
    

    From the documentation:

    row.names: a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.

    For the second problem, you can use mapply:

    mapply(cor, APD, rar, MoreArgs = list(method = "spearman"))
    

    assuming that you want the correlation between the 1st column of each table, then the 2nd column, and so on.


    Using your example:

    > str(a)
    'data.frame':   3 obs. of  4 variables:
     $ V1: num  9.37 8.13 7.87
     $ V2: num  9.69 8.94 8.7
     $ V3: num  12.4 11.4 11.4
     $ V4: num  13.1 12.2 12.2
    > mapply(cor, a, -a, MoreArgs = list(method = "spearman"))
    V1 V2 V3 V4 
    -1 -1 -1 -1