Search code examples
rread-csv

How to pin point the problem read_delim is having?


Just to preface, I'm very fresh with R, and sorry for the "special characters". I'm currently tryign to read this CSV file I'm working with. Here is my code

X17_01_24_Rawdata_SSB_fish_2021 <- read_delim("17-01-24-Rawdata-SSB-fish-2021.csv", 
    delim = ";", escape_double = FALSE, col_names = FALSE, 
    locale = locale(encoding = "latin1"), 
    trim_ws = TRUE, skip = 2)

Here is the first few line of the CSV

"09283: Eksport av fisk, etter land, statistikkvariabel, ?r og varegruppe",
,
"land;""Verdi (mill. kr) 2021 Fisk"," krepsdyr og bl?tdyr i alt"";""Verdi (mill. kr) 2021 Laks"";""Verdi (mill. kr) 2021 Torsk"";""Verdi (mill. kr) 2021 Sild"";""Verdi (mill. kr) 2021 Makrell"";""Verdi (mill. kr) 2021 Sei"";""Verdi (mill. kr) 2021 ?rret"";""Verdi (mill. kr) 2021 Hyse"";""Verdi (mill. kr) 2021 Lange"";""Verdi (mill. kr) 2021 Brosme"";""Verdi (mill. kr) 2021 Uer"";""Verdi (mill. kr) 2021 Kveite"";""Verdi (mill. kr) 2021 Annen fisk"";""Verdi (mill. kr) 2021 Reker"";""Verdi (mill. kr) 2021 Andre skalldyr/bl?tdyr"""
Albania;4;0;0;1;0;0;0;0;0;0;0;0;0;:;3,
Andorra;0;0;0;0;0;0;0;0;0;0;0;0;0;:;0,
Belarus;1135;179;0;170;58;0;701;0;0;0;1;0;25;:;0,

enter image description here Image of the table from which the CSV was generated

My goal is for the data to be split at the delimiter ; into their seperat columns. If I skip line 3 as well, it works and I just have default column names. When I dont skip it, nothing is split. I could just rename the columns manually, but that seems very bruteish. And I'm aware that if they eventually could be split, they'd need cleaning as well

Is read_delim the wrong tool? How does it work, and whats confusing it from doing what its "supposed to?"


Solution

  • Changing col_names to TRUE and quote to "" (empty string) seems to do what you want, once you remove all the extra quotation marks from the column names (I think the main problem is that your semicolon delimiters are inside quotation marks in the column names)

    read_delim("tmp.dat",
        delim = ";", escape_double = FALSE, col_names = TRUE, 
        locale = locale(encoding = "latin1"), 
        trim_ws = TRUE, skip = 2,
        quote = "") |>
      rename_with( ~ stringr::str_remove_all(., '"')
    

    I would probably follow this with a

    rename_with( ~ stringr::str_remove("Verdi (mill. kr) 2021 "))
    

    ...