Search code examples
rnahclustpheatmap

Pheatmap won’t cluster rows: NA/NaN/Inf in foreign function call (arg 10)


I have been using the same variations of a pheatmap code to make heat maps for several months now without any problems, but lately it has stopped being able to cluster rows. Columns still cluster like normal but whenever I try to add row clustering it gives me the same error message about NA/NaN/Inf in the data

All of my datasets look very similar, with primarily just the number of rows changing (between 40-2000+). Here is a head of data I'm currently using, with all 0s already replaced with NA:

> head(protdata, 4)
          PR1      PO1      WA1     PR2      PO2      WA2      PR3      PO3     WA3      PR4 PO4     WA4      PR5      PO5
[1,] 0.004420       NA 0.002370 0.00141 0.002890 0.003740 4.36e-03 0.005370 0.00143 0.002070  NA 0.00428 0.005220       NA
[2,] 0.000233 8.85e-06 0.000136      NA 0.000056 0.000713 5.98e-05       NA      NA 0.000541  NA      NA 0.006700 4.95e-05
[3,] 0.001220 1.79e-05 0.000447 0.00183 0.000136       NA 6.99e-04 0.000298 0.00267 0.001330  NA      NA 0.000655 1.36e-04
[4,] 0.001170 6.84e-04 0.000282 0.00173 0.001620 0.000648 1.05e-03 0.003570 0.00101 0.001410  NA      NA 0.002960       NA
          WA5     PR6      PO6      WA6      PR7      PO7      WA7
[1,] 0.001030 0.00448       NA 1.53e-03 0.005220 0.005520 1.86e-03
[2,] 0.000139 0.00145 0.000484 8.88e-05 0.000118 0.000122 1.79e-05
[3,] 0.003680 0.00033       NA       NA       NA 0.000163 3.99e-03
[4,] 0.000393 0.00023       NA       NA 0.000625       NA 7.15e-04

There are a lot of 0s in the datasets, but clustering has always worked as long as they are converted to NA. None of the columns or rows are zero variance. Here is the code I've been using to make the heat maps:

protdata <- as.matrix(input[,-1])
protdata[protdata == 0] <- NA

rownames <- input[,1]
annotation_row <- data.frame(rownames)
rownames(protdata) <- annotation_row$Gene

pheatmap(log10(protdata), scale="row", border_color=NA, na_col="white", breaks=seq(-2,2,.01),
     color=colorRampPalette(rev(brewer.pal(n=7, name="RdYlBu")))(400))

And here is the error message I keep getting:

Error in hclust(d, method = method) : 
  NA/NaN/Inf in foreign function call (arg 10)

The only way I can get a plot to appear is with cluster_rows=FALSE included in the above. I am stumped as to why this was working perfectly and now isn't, when as far as I know I haven't changed anything with the way I'm inputting my data.

Any help would be greatly appreciated!!


Solution

  • I converted your file into a csv and read in:

    mat = read.csv("peet_protdata.csv",row.names=1)
    mat[mat==0] = NA
    

    No rows with all NAs or zero variance like you said, but if you do the dist calculation, there are NAs in some of the entries, indicating between some rows, it's not possible to calculate euclidean distances. You need to the euclidean distance matrix to have no NAs to do clustering:

     sum(is.na(as.matrix(dist(mat))))
    [1] 434
    

    Below is a quick (nasty) bit to find the rows giving the most NAs, remove them to give you a complete distance matrix:

    giveNAs = which(is.na(as.matrix(dist(mat))),arr.ind=TRUE)
    head(giveNAs)
        row col
    G103  18   1
    G100  53   1
    

    So for example, row 18 and row 1 is giving you problems, and you can see there are no complete observations (pairwise):

    mat[c(1,18),]
         PR1 PO1 WA1         PR2 PO2 WA2         PR3 PO3 WA3         PR4
    G56   NA  NA  NA 0.000483209  NA  NA 0.000433088  NA  NA 0.000203604
    G103  NA  NA  NA          NA  NA  NA          NA  NA  NA          NA
                 PO4         WA4         PR5        PO5 WA5 PR6 PO6 WA6 PR7
    G56  0.000294898 0.000269724 0.000299341 0.00046987  NA  NA  NA  NA  NA
    G103          NA          NA          NA         NA  NA  NA  NA  NA  NA
                 PO7         WA7         PR8 PO8 WA8         PR9         PO9 WA9
    G56  0.000682594 0.000656168 0.000702988  NA  NA          NA          NA  NA
    G103          NA          NA          NA  NA  NA 0.000629987 0.000504159  NA
    

    We get the rows out and start checking what to remove:

    tab = sort(table(c(giveNAs)),decreasing=TRUE)
    checkNA = sapply(1:length(tab),function(i){
    sum(is.na(as.matrix(dist(mat[-as.numeric(names(tab[1:i])),]))))
    })
    rmv = names(tab)[1:min(which(checkNA==0))]
    
     [1] "18"  "53"  "81"  "84"  "54"  "97"  "55"  "38"  "70"  "100" "31"  "93" 
    [13] "52"  "80"  "91"
    

    We remove those 15 rows:

    mat = mat[-as.numeric(rmv),]
    pheatmap(mat)
    

    enter image description here