Search code examples
rimportdbf

Import fixed width data file with no line separator


I have fixed width data files (.dbf) that don't have line separators. Here is what two lines of that datafile looks like:

20141101 77h  3.210                                  0    3 20141102 76h  3.090                                  0    3 

The widths of one line is c(8,4,7,41) for date (8), some time measure (4), the data point (7), and some other columns that i can summarize in one "rest" column (41). After one line there is no separator and the next line is just appended to the first line. All time steps are basically written consecutively in one massive line. There is exclusively numbers, characters and white space in this file.

With read.fwf('filepath', widths = c(8,4,7,41)) R stops reading after the first line due to lack of line separator.

Is there an argument to tell read.fwf() when to start reading the new line when there is no line separator? Or should i use a different read command?

Thanks in advance.


Solution

  • A different, and probably less elegant, solution with readLines, substr, trimws, separate (tidyr) and mutate_all (dplyr):

    txt <- readLines('filepath')
    dfx <- data.frame(V1 = sapply(seq(from=1, to=nchar(txt), by=60),
                                  function(x) substr(txt, x, x+59)))
    
    library(dplyr)
    library(tidyr)
    dfx %>% 
      separate(V1, c(paste0("V",LETTERS[1:5])), c(8,12,19,55)) %>% 
      mutate_all(trimws)
    

    which gives:

            VA  VB    VC VD VE
    1 20141101 77h 3.210  0  3
    2 20141102 76h 3.090  0  3
    

    To get different column names , just replace c(paste0("V",LETTERS[1:5]) with a vector of columnnames you want.

    If you want to transform the columns into the correct classes instead of into character, you can use funs(ul = type.convert(trimws(.))) inside mutate_all.