Search code examples
rcsvoutputdbf

Read multiple DBF files in R


I have multiple DBF files in C:\Deskop\Test folder. For example, I have listed 5 of the DBF file names below.

week_1_avg_199811_var86_TBL.dbf
week_1_avg_200510_var86_TBL.dbf
week_2_avg_200009_var86_TBL.dbf
week_2_avg_200309_var86_TBL.dbf
week_3_avg_200011_var86_TBL.dbf

Each file has 5 columns and 1410 rows. For example, (week_1_avg_199811_var86_TBL.dbf) I will list 5 rows. Column 1 to 4 are identical in each DBF file.

T_R          ZONE     COUNT     AREA         MEAN
T10S-R26W    1        1         0.0156       29.9412
T10S-R27W    1        1         0.0156       30.1242
T10S-R28W    1        1         0.0156       29.7399
T10S-R2E     1        1         0.0156       37.7858
T10S-R2W     1        1         0.0156       34.6999

I want to output one CSV file (Output.csv) reading above (all) DBF files as follows.

Output.csv

YEAR    T_R         Sep_week_1    Oct_week_1       Nov_week_1
1998    T10S-R26W   <some_value>       <some_value>     29.9412 
1998    T10S-R27W   <some_value>       <some_value>     30.1242 
1998    T10S-R28W   <some_value>       <some_value>     29.7399 
1998    T10S-R2E    <some_value>       <some_value>     37.7858 
1998    T10S-R2W    <some_value>      <some_value>      34.6999  
2005    T10S-R26W   <some_value>      <some_value>      <some_value>
2005    T10S-R27W   <some_value>      <some_value>      <some_value>
2005    T10S-R28W   <some_value>      <some_value>      <some_value>
2005    T10S-R2E    <some_value>       <some_value>     <some_value>
2005    T10S-R2W    <some_value>       <some_value>     <some_value>

Can somebody help me how can I achieve this in R?

I have the following code that reads all the .dbf files in the folder but I need to create YEAR and weekly values columns by reading file name.

rm(list=ls(all=TRUE))
setwd("C:/Deskop/Test")
graphics.off()

    library("foreign")
library("purrr")

readDBA <- function(file){
  df <- read.dbf(file, as.is=FALSE)
  df$fileName <- file
  return(df)
}

file.names <- list.files(pattern="*\\.dbf")

combinedData <- file.names %>% map_dfr(readDBA)
combinedData$Year <- gsub(".*_(\\d{4}).*", "\\1", combinedData$fileName)
combinedData$Month <- as.numeric(gsub(".*_\\d{4}(\\d{2}).*", "\\1", combinedData$fileName))
combinedData$Month_Abb <- month.abb[combinedData$Month]
combinedData$Week <- substr(combinedData$fileName, 6, 6)
cols <- c("Month_Abb","Week")
combinedData$Month_Week <- do.call(paste, c(combinedData[cols], sep="_wk"))
DT <- dcast(combinedData, Year + T_R + Month_Week + MEAN ~ Month_Week) 

See the image of "Output.csv" file. enter image description here

Using Month_Week as value column: use value.var to override.

Any help is appreciated,

Thanks in advance.


Solution

  • The trick here is to define a function to read in the .dba files, and add a file name tag. Then we'll use purrr:map_dfr to read in, then rbind() each file in one step.

    library(purrr)
    library(foreign)
    
    readDBA <- function(file){
      df <- read.dba(file, as.is=FALSE)
      df$fileName <- file
      return(df)
    }
    
    file.names <- list.files(pattern="*\\.dba")
    
    combinedData <- file.names %>% map_dfr(readDBA)
    

    Then a little REGEX to get Year and Month

    combinedData$Year <- gsub(".*_(\\d{4}).*", "\\1", combinedData$fileName)
    combinedData$Month <- gsub(".*_\\d{4}(\\d{2}).*", "\\1", combinedData$fileName)
    

    I'll let you figure out putting week and month together.

    Then you'll want to reshape2::dcast() to a wide format, something like this but I can't say for sure unless you provide an actual sample of your data frame.

    require(reshape2)
    
    dcast(combinedData, Year + T_R ~ MonthWeek, value.var= "MEAN")
    

    Also, R has a handy month.abb vector so you can get your month abbreviations using month.abb[combinedData$Month]