I have a matrix of sites. I want to develop a UPGMA aglomerative cluster. I want to use R and the vegan library for that. My matrix has sites in which not all the variables were measured.
Following a similar matrix of data:
Variable 1;Variable 2;Variable 3;Variable 4;Variable 5
0.5849774671338231;0.7962161133598957;0.3478909861199184;0.8027122599553912;0.5596553797833573
0.5904142034898171;0.18185393432022612;0.5503250366728479;NA;0.05657408486342197
0.2265148074206368;0.6345513807275411;0.8048128547418062;0.3303602674038131;0.8924461773052935
0.020429460126217602;0.18850489885886157;0.26412619465769416;0.8020472793070729;NA
0.006945970735023677;0.8404983401121199;0.058385134042814646;0.5750066564897788;0.737599672122899
0.9909722313946067;0.22356808747617019;0.7290078902086897;0.5621006367587756;0.3387823531518016
0.5932907022602052;0.899773235815933;0.5441346748937264;0.8045695319247985;0.6183003409599681
0.6520679140573288;0.5419713133237936;NA;0.7890033752744002;0.8561828607592286
0.31285906479192593;0.3396351688936058;0.5733594373520889;0.03867689654415574;0.1975784885854912
0.5045966366726562;0.6553489439611587;0.029929403932252963;0.42777351534900676;0.8787135401098227
I am planing to do it with the following code:
library(vegan)
# env <- read.csv("matrix_of_sites.csv")
env.norm <- decostand(env, method = "normalize") # Normalizing data here
env.ch <- vegdist(env.nom, method = "euclidean")
env.ch.UPGMA <- hclust(env.ch, method="average")
plot(env.ch.UPGMA)
After I run the second line, I get this error:
Error in x^2 : non-numeric argument to binary operator
I am not familiar with R, so I am not sure if this is due to the cells with no data. How can I solve this?
R does not think that data are numeric in your matrix, but at least some of them were interpreted as character variables and changed to factors. Inspect your data after reading int into R. If all your data are numbers, then sum(env)
gives a numeric result. Use str()
or summary()
functions for detailed inspection.
From R's point of view, your data file has mixed formatting. R function read.csv
assumes that items are separated by comma (,
) and the decimal separator is period (.
), and read.csv2
assumes that items are separated by colon (;
) and decimal separator is comma ,
. You mix these two conventions. You can read data formatted like that, but you may have to give both the sep
and dec
arguments.
If you get your data correctly in R, then decostand
will stop with error: it does not accept missing values if you do not add na.rm = TRUE
. The same also with the next vegdist
command: it also needs na.rm = TRUE
to analyse your data.