Search code examples
rreadr

read.delim() function for columns that start at different point in R


So i'm trying to use the read.delim()function to look at a CSV (I know I can use read.csv(), but I need to do it this way). The csv file has columns that begin at different points, so how would I go about incorperating the code like that? The csv would look like this (example):

,,,Column_D, Column_E,
Column_A, Column_B, Column_C,,,
1,1,2,3,4,
.,.,.,.,.,
.,.,.,.,.,

Ive tried to do this:

    dataRAW <-  read_delim("./data/something.csv", delim = ",", col_types = cols(
          Column_A = col_integer(),
          Column_B = col_integer(),
          Column_C = col_integer(),
          Column_D = col_integer(),
          Column_E = col_integer()

        ), skip = 1)

What happens when R reads the file is that columns A B C have proper headings, but E and D don't. I would like all of them to have their proper headings. If I don't use the skip function, then columns D and E get proper headings, but then the other ones (ABC) don't.


Solution

  • As proposed by @Tung you can skip the 2 first lines but then instead of manually setting the column names you can collect the two first lines from the data and combine them to set the column names.

    library(tidyverse)
    
    d <- read_delim("~/Bureau/something.csv", delim = ",", skip = 2, col_names = FALSE) 
    names1 <- read_delim("~/Bureau/something.csv", delim = ",", 
                         skip = 0, n_max = 1, col_names = FALSE) %>% t %>% as.vector
    names2 <- read_delim("~/Bureau/something.csv", delim = ",", 
                         skip = 1, n_max = 1, col_names = FALSE) %>% t %>% as.vector
    

    Remove the "NA" from the column names and then combine them with a simple paste.
    Note that in your example the last column has no name and " Column_E" starts with a space character...

    names1[is.na(names1)] <- ""
    names2[is.na(names2)] <- ""
    
    colnames(d) <- paste0(names1, names2)
    
    d
    #> # A tibble: 3 x 6
    #>   Column_A ` Column_B` ` Column_C` Column_D ` Column_E` ``   
    #>   <chr>    <chr>       <chr>       <chr>    <chr>       <chr>
    #> 1 1        1           2           3        4           <NA> 
    #> 2 .        .           .           .        .           <NA> 
    #> 3 .        .           .           .        .           <NA>
    

    Created on 2018-03-10 by the reprex package (v0.2.0).