Search code examples
rreadr

How to import this csv file with "@" as delimiter using readr


This is my excel files and I would like to import it using the readr::read_csv() file or maybe other function using "," and "@" as delimiter

Describing my .csv file:

Columns <-  CoreId,Identifier,type,rightsHolder,creator,accessURI,format,variantLiteral,license

    First row <- 296911@OBS,https://observation.org/photos/9999/,StillImage,André den Ouden,Andre den Ouden,https://observation.org/photos/9999.jpg,image/jpeg,Best Quality,All rights reserved

Second row <- 45689812@OBS,https://observation.org/photos/999999/,StillImage,Hans van Kersbergen,Hans van Kersbergen,https://observation.org/photos/999999.jpg,image/jpeg,Best Quality,CC BY-NC-ND 4.0

enter image description here

I need to import this .csv file and I use this:

read_csv(file = 'myfile.csv',progress = TRUE)

This is the output: enter image description here

As you can see, the '@' symbol is meant to separate one column, but it's currently merged with the first column.

Is it possible to explicitly treat '@' as a delimiter to separate the columns?

Any help would be appreciated.


Solution

  • If the data is already loaded, i.e.

    mydata=read_csv(file = 'myfile.csv',progress = TRUE)
    

    has been executed, you might want to correct it:

    cbind.data.frame(
      t(list2DF(strsplit(mydata$CoreId, "@"))) |> `colnames<-`(c("Core", "Id")), 
      mydata[-1]) |> 
      `row.names<-`(NULL) # cosmetics
    
    # or (writing less)
    cbind(do.call("rbind", strsplit(mydata$CoreId, "@")), mydata[-1]) |> 
      `names<-`(c("Core", "Id", names(mydata[-1]))) 
    

    giving

          Core  Id                             Identifier       type        rightsHolder
    1   296911 OBS   https://observation.org/photos/9999/ StillImage     André den Ouden
    2 45689812 OBS https://observation.org/photos/999999/ StillImage Hans van Kersbergen
    

    Input

    (truncated)

    mydata = structure(
      list(
        CoreId = c("296911@OBS", "45689812@OBS"),
        Identifier = c(
          "https://observation.org/photos/9999/",
          "https://observation.org/photos/999999/"
        ),
        type = c("StillImage", "StillImage"),
        rightsHolder = c("André den Ouden", "Hans van Kersbergen")
      ),
      class = "data.frame",
      row.names = c(NA, -2L)
    )