Search code examples
rscalerowsbioinformaticsnormalizing

How to normalize data in R excluding certain rows?


I am trying to graph some sequencing data and want to exclude Chromosome 4 data (where the rows in the first column have a '4') when I scale it. Chromosome 4 may skew the normalizing, so I want to exclude it from my scale() function. Is there any way to do that? Right now, I have:

preMBT_RT <-preMBT_RT %>% mutate_each_(funs(scale(.) %>% as.vector),vars=c("Timing"))

^But is there any way I can indicate IN that function to exclude rows with '4' in the first column?? Or is the only way to do that to create a NEW data frame which does not have chromosome 4 data in it?

Here is a sample of what the data frame looks like in brief:

Chromosome     Location     Replication Timing
1              3748         -0.0001
4              1847101      0.000302   <-row I would want to exclude
20             1234         0.000102
...            ...          ...

Solution

  • You can always use the filter() method, like:

    preMBT_RT <-preMBT_RT %>% filter(Chromosome!=4) %>% 
    mutate_each_(funs(scale(.) %>% as.vector),vars=c("Timing"))