Search code examples
rwildcardreadr

How to use wildcards to define col_type when using readr?


I just asked a few days ago, how to set a specific column type when using readr package. big integers when reading file with readr in r

Is there a way to define the column names by wildcard? In my case, I have sometimes several columns starting with Intensity and an appendix depending on the experiment. It is hard to use read_tsv in a function if you not know upfront which project names where used.

So something like col_types = cols('Intensity.*' = col_double()) would be awesome.

Anyone an idea how to get this feature?

EDIT: Maybe something like read the first 2 lines, grep 'Intensity' in the names and then somehow create this parameter like cols(Intensity=col_double(), 'Intensity pg'=col_double(), 'Intensity hs'=col_double()). But I have no idea how to create this parameter value on the fly.


Solution

  • I add the answer which solved my question, based on the comment of lukeA...

    read_MQtsv <- function(file) {
      require('readr')
      jnk <- read.delim(file, nrows=1, check.names=FALSE)
      matches <- grep('Intensity|LFQ|iBAQ', names(jnk), value=TRUE)
      read_tsv(file, 
               col_types=setNames(
                 rep(list(col_double()), length(matches)), 
                 matches))
    }
    

    So I adapted the single line from the comment to a new function which I would use when reading my special files which are produced by a program called MaxQuant.