Search code examples
rrscript

Error in apply(counts, 2, function(x) rpkm(x, lengths)) : dim(X) must have a positive length


I am trying to use the script tpm_rpkm.R script. but i am getting error saying

Error in apply(counts, 2, function(x) rpkm(x, lengths)) : dim(X) must have a positive length.

(The data table should not have any error as it was generated through same programme as that the author of the script used.)

Here is the script

#! /usr/bin/env Rscript

# Author: Andy Saurin ([email protected])
#
# Simple RScript to calculate RPKMs and TPMs
# based on method for RPKM/TPM calculations shown in http://www.rna-seqblog.com/rpkm-fpkm-and-tpm-clearly-explained/
#
# The input file is the output of featureCounts
#

rpkm <- function(counts, lengths) {
  pm <- sum(counts) /1e6
  rpm <- counts/pm
  rpm/(lengths/1000)
}

tpm <- function(counts, lengths) {
  rpk <- counts/(lengths/1000)
  coef <- sum(rpk) / 1e6
  rpk/coef
}


## read table from featureCounts output
args <- commandArgs(T)

tag <- tools::file_path_sans_ext(args[1])


cat('Reading in featureCounts data...')
ftr.cnt <- read.table(args[1], sep="\t", header=T, quote="") #Important to disable default quote behaviour or else genes with apostrophes will be taken as strings
cat(' Done\n')

if ( ncol(ftr.cnt) < 7 ) { 
    cat(' The input file is not the raw output of featureCounts (number of columns > 6) \n')
    quit('no')
}

lengths = ftr.cnt[,6]

counts <- ftr.cnt[,7:ncol(ftr.cnt)]

cat('Performing RPKM calculations...')

rpkms <- apply(counts, 2, function(x) rpkm(x, lengths) )
ftr.rpkm <- cbind(ftr.cnt[,1:6], rpkms)

rpkms <- apply(counts, 2, function(x) rpkm(x, lengths) )
ftr.rpkm <- cbind(ftr.cnt[,1:6], rpkms)
write.table(ftr.rpkm, file=paste0(tag, "_rpkm.txt"), sep="\t", row.names=FALSE, quote=FALSE)
cat(' Done.\n\tSaved as ')
cat ( paste0(tag, "_rpkm.txt", '\n') )

cat('Performing TPM calculations...')

tpms <- apply(counts, 2, function(x) tpm(x, lengths) )

ftr.tpm <- cbind(ftr.cnt[,1:6], tpms)

write.table(ftr.tpm, file=paste0(tag, "_tpm.txt"), sep="\t", row.names=FALSE, quote=FALSE)
cat(' Done.\n\tSaved as ')
cat ( paste0(tag, "_tpm.txt", '\n') )


quit('no')

command output:

Rscript tpm_rpkm.R 450-3-hard_filtered.featureCounts Reading in featureCounts data... Done Performing RPKM calculations...Error in apply(counts, 2, function(x) rpkm(x, lengths)) : dim(X) must have a positive length halt execution

My featurecount table looks like this:

Geneid | Chr | Start | End | Strand | Length | 1_1 | NODE_1_length_59711_cov_84.026979_g0_i0 | 116 | 904 | + | 789 | 198 1_2 | NODE_1_length_59711_cov_84.026979_g0_i0 | 1178 | 3514 | - | 2337 | 2294 1_3 | NODE_1_length_59711_cov_84.026979_g0_i0 | 3618 | 4319 | + | 702 | 502 1_4 | NODE_1_length_59711_cov_84.026979_g0_i0 | 4337 | 4921 | + | 585 | 320 1_5 | NODE_1_length_59711_cov_84.026979_g0_i0 | 4953 | 5906 | + | 954 | 799 1_6 | NODE_1_length_59711_cov_84.026979_g0_i0 | 5920 | 7056 | + | 1137 | 532 1_7 | NODE_1_length_59711_cov_84.026979_g0_i0 | 7061 | 8071 | + | 1011 | 761 1_8 | NODE_1_length_59711_cov_84.026979_g0_i0 | 8068 | 8766 | + | 699 | 188 1_9 | NODE_1_length_59711_cov_84.026979_g0_i0 | 8766 | 9656 | + | 891 | 217 1_10 | NODE_1_length_59711_cov_84.026979_g0_i0 | 9640 | 10710 | + | 1071 | 408 1_11 | NODE_1_length_59711_cov_84.026979_g0_i0 | 10692 | 11348 | + | 657 | 162 1_12 | NODE_1_length_59711_cov_84.026979_g0_i0 | 11359 | 12282 | + | 924 | 342

Does anyone know how to deal with this?


Solution

  • Update the counts definition to this:

    counts <- ftr.cnt[,7:ncol(ftr.cnt), drop=FALSE]
    

    that should make sure it remains a 2-dimensional structure for which apply can now work.