I have two data frames:
locA:
Date Time Temperature Voltage Longitude Latitude Deployment DateTime
2014-08-08 14:18:25 20.9 1.872 -126.6886 34.95633 A 2014-08-08 14:20:00
2014-08-08 14:38:25 19.8 1.862 -126.6899 34.95675 A 2014-08-08 14:40:00
2014-08-08 14:58:26 19.3 1.856 -126.6911 34.95732 A 2014-08-08 15:00:00
2014-08-08 15:18:26 18.8 1.872 -126.6921 34.95803 A 2014-08-08 15:20:00
2014-08-08 15:38:25 18.6 1.872 -126.6929 34.95872 A 2014-08-08 15:40:00
2014-08-08 15:58:26 18.4 1.859 -126.6937 34.95921 A 2014-08-08 16:00:00
2014-08-08 16:18:25 18.4 1.875 -126.6944 34.95983 A 2014-08-08 16:20:00
2014-08-08 16:38:25 18.4 1.875 -126.6949 34.96021 A 2014-08-08 16:40:00
2014-08-08 16:58:26 18.4 1.872 -126.6956 34.96033 A 2014-08-08 17:00:00
2014-08-08 17:18:26 18.4 1.872 -126.6962 34.96025 A 2014-08-08 17:20:00
and datA:
FileName Deployment Start_UTC Stop_UTC Duration sum PrAb dolPrAb
DASBR1_20140808$140000.wav A 2014-08-08 14:00:00 2014-08-08 14:02:00 119.6000 0 0 0
DASBR1_20140808$142000.wav A 2014-08-08 14:20:00 2014-08-08 14:22:00 119.6000 0 0 0
DASBR1_20140808$144000.wav A 2014-08-08 14:40:00 2014-08-08 14:42:00 119.8000 0 0 0
DASBR1_20140808$150000.wav A 2014-08-08 15:00:00 2014-08-08 15:02:00 119.6000 0 0 0
DASBR1_20140808$152000.wav A 2014-08-08 15:20:00 2014-08-08 15:22:00 119.6000 0 0 0
DASBR1_20140808$154000.wav A 2014-08-08 15:40:00 2014-08-08 15:42:00 119.6000 0 0 0
DASBR1_20140808$160000.wav A 2014-08-08 16:00:00 2014-08-08 16:02:00 119.6000 1 1 1
DASBR1_20140808$162000.wav A 2014-08-08 16:20:00 2014-08-08 16:22:00 119.8000 0 0 0
DASBR1_20140808$164000.wav A 2014-08-08 16:40:00 2014-08-08 16:42:00 119.8000 0 0 0
DASBR1_20140808$170000.wav A 2014-08-08 17:00:00 2014-08-08 17:02:00 119.6000 0 0 0
Each are actually several thousand rows long. I want to merge them based on the "DateTime" stamp in locA and the "Start_UTC" in datA. Both of those columns have been formatted identically in UTC.
locA$DateTime=as.POSIXct(paste(locA$Date, time), format="%Y-%m-%d %H:%M:%S", tz="UTC")
datA$Start_UTC = as.POSIXct(files$Start_UTC, format="%Y-%m-%d %H:%M:%S", tz="UTC")
In locA, the "time" variable comes from a sequence where I round the character string in locA$Time to the nearest 20min. "time" is a character string with the adjusted time.
str(time)
chr [1:8845] "14:00:00" "14:20:00" "14:40:00" "15:00:00" "15:20:00" "15:40:00" "16:00:00" "16:20:00" "16:40:00" "17:00:00" ...
When I merge the two data frames using locA$DateTime and datA$Start_UTC, it never works properly. I get something that looks like this, every time.
resA = merge(datA,locA, by.x=c("Start_UTC"), by.y=c("DateTime"))
resA:
Start_UTC FileName Deployment.x Stop_UTC Duration sum PrAb dolPrAb Date Time Temperature Voltage Longitude Latitude Deployment.y
2014-08-08 14:00:00 DASBR1_20140808$140000.wav A 2014-08-08 14:02:00 119.6000 0 0 0 2014-08-08 20:58:26 19.1 1.872 -126.7018 34.94994 A
2014-08-08 14:20:00 DASBR1_20140808$142000.wav A 2014-08-08 14:22:00 119.6000 0 0 0 2014-08-08 21:18:26 19.1 1.872 -126.7027 34.94801 A
2014-08-08 14:40:00 DASBR1_20140808$144000.wav A 2014-08-08 14:42:00 119.8000 0 0 0 2014-08-08 21:38:26 19.1 1.872 -126.7038 34.94608 A
2014-08-08 15:00:00 DASBR1_20140808$150000.wav A 2014-08-08 15:02:00 119.6000 0 0 0 2014-08-08 21:58:27 18.9 1.865 -126.7050 34.94394 A
2014-08-08 15:20:00 DASBR1_20140808$152000.wav A 2014-08-08 15:22:00 119.6000 0 0 0 2014-08-08 22:18:27 18.9 1.856 -126.7062 34.94188 A
2014-08-08 15:40:00 DASBR1_20140808$154000.wav A 2014-08-08 15:42:00 119.6000 0 0 0 2014-08-08 22:38:26 18.8 1.862 -126.7077 34.93994 A
2014-08-08 16:00:00 DASBR1_20140808$160000.wav A 2014-08-08 16:02:00 119.6000 1 1 1 2014-08-08 22:58:26 18.8 1.859 -126.7090 34.93796 A
2014-08-08 16:20:00 DASBR1_20140808$162000.wav A 2014-08-08 16:22:00 119.8000 0 0 0 2014-08-08 23:18:26 18.8 1.865 -126.7105 34.93622 A
2014-08-08 16:40:00 DASBR1_20140808$164000.wav A 2014-08-08 16:42:00 119.8000 0 0 0 2014-08-08 23:38:27 18.6 1.856 -126.7120 34.93480 A
2014-08-08 17:00:00 DASBR1_20140808$170000.wav A 2014-08-08 17:02:00 119.6000 0 0 0 2014-08-09 23:58:28 18.8 1.849 -126.8051 34.88381 A
I have tried this so many different ways. I tried keeping the locA$DateTime as is, where the time equals the time stamp in datA$Time, and then tried using this function to match it to the correct datA$Start_UTC. Match the minimum locA$timestamp to the datA$timestamp
findrow=function(dt,df) {min(which(df>dt))}
rowA=sapply(locA$DateTime, findrow, df=datA$Start_UTC)
resA=cbind(locA,datA[rowA,])
This gave me the same results. I don't know what's going on. I'm sure it's intrinsic to how the datetime stamps are generated, but I don't know how to fix this. If I format them the same, and tell R that they are UTC, shouldn't they be treated the same way?
Help, please, all help, comments, or humorous "hang in there" statements would greatly appreciated. Not being able to properly pair my lat/longs with my data timestamps is really causing a headache.
Cheers.
I wonder if doing a merge on chr objects instead of POSIXct objects would work better?
locA$DateTime_str = format(locA$DateTime)
datA$Start_UTC_str = format(datA$Start_UTC)
resA = merge(datA,locA, by.x=c("Start_UTC_str"), by.y=c("DateTime_str"))