I have multiple DBF files in C:\Deskop\Test folder. For example, I have listed 5 of the DBF file names below.
week_1_avg_199811_var86_TBL.dbf
week_1_avg_200510_var86_TBL.dbf
week_2_avg_200009_var86_TBL.dbf
week_2_avg_200309_var86_TBL.dbf
week_3_avg_200011_var86_TBL.dbf
Each file has 5 columns and 1410 rows. For example, (week_1_avg_199811_var86_TBL.dbf) I will list 5 rows. Column 1 to 4 are identical in each DBF file.
T_R ZONE COUNT AREA MEAN
T10S-R26W 1 1 0.0156 29.9412
T10S-R27W 1 1 0.0156 30.1242
T10S-R28W 1 1 0.0156 29.7399
T10S-R2E 1 1 0.0156 37.7858
T10S-R2W 1 1 0.0156 34.6999
I want to output one CSV file (Output.csv) reading above (all) DBF files as follows.
Output.csv
YEAR T_R Sep_week_1 Oct_week_1 Nov_week_1
1998 T10S-R26W <some_value> <some_value> 29.9412
1998 T10S-R27W <some_value> <some_value> 30.1242
1998 T10S-R28W <some_value> <some_value> 29.7399
1998 T10S-R2E <some_value> <some_value> 37.7858
1998 T10S-R2W <some_value> <some_value> 34.6999
2005 T10S-R26W <some_value> <some_value> <some_value>
2005 T10S-R27W <some_value> <some_value> <some_value>
2005 T10S-R28W <some_value> <some_value> <some_value>
2005 T10S-R2E <some_value> <some_value> <some_value>
2005 T10S-R2W <some_value> <some_value> <some_value>
Can somebody help me how can I achieve this in R?
I have the following code that reads all the .dbf files in the folder but I need to create YEAR and weekly values columns by reading file name.
rm(list=ls(all=TRUE))
setwd("C:/Deskop/Test")
graphics.off()
library("foreign")
library("purrr")
readDBA <- function(file){
df <- read.dbf(file, as.is=FALSE)
df$fileName <- file
return(df)
}
file.names <- list.files(pattern="*\\.dbf")
combinedData <- file.names %>% map_dfr(readDBA)
combinedData$Year <- gsub(".*_(\\d{4}).*", "\\1", combinedData$fileName)
combinedData$Month <- as.numeric(gsub(".*_\\d{4}(\\d{2}).*", "\\1", combinedData$fileName))
combinedData$Month_Abb <- month.abb[combinedData$Month]
combinedData$Week <- substr(combinedData$fileName, 6, 6)
cols <- c("Month_Abb","Week")
combinedData$Month_Week <- do.call(paste, c(combinedData[cols], sep="_wk"))
DT <- dcast(combinedData, Year + T_R + Month_Week + MEAN ~ Month_Week)
See the image of "Output.csv" file.
Using Month_Week as value column: use value.var to override.
Any help is appreciated,
Thanks in advance.
The trick here is to define a function to read in the .dba
files, and add a file name tag. Then we'll use purrr:map_dfr
to read in, then rbind()
each file in one step.
library(purrr)
library(foreign)
readDBA <- function(file){
df <- read.dba(file, as.is=FALSE)
df$fileName <- file
return(df)
}
file.names <- list.files(pattern="*\\.dba")
combinedData <- file.names %>% map_dfr(readDBA)
Then a little REGEX to get Year and Month
combinedData$Year <- gsub(".*_(\\d{4}).*", "\\1", combinedData$fileName)
combinedData$Month <- gsub(".*_\\d{4}(\\d{2}).*", "\\1", combinedData$fileName)
I'll let you figure out putting week and month together.
Then you'll want to reshape2::dcast()
to a wide format, something like this but I can't say for sure unless you provide an actual sample of your data frame.
require(reshape2)
dcast(combinedData, Year + T_R ~ MonthWeek, value.var= "MEAN")
Also, R has a handy month.abb
vector so you can get your month abbreviations using month.abb[combinedData$Month]