I have two data sets, the fire data set is huge and the global temp data set is quite a bit smaller than it.
So I would like to match the two data sets by DISCOVERY_DATE = date, Latitude = latitude and longitude = longitude. Now i know most of them will not be a match but i am looking just for as close as match as possible. I think fuzzyjoin would be a good way to go about this but how would one match all three with this.
Im thinking the issue may be that I cant seem to find a good function for this.
tempFire <- fuzzy_join(fires, Temps, multi_by = c("DISCOVERY_DATE" = "date", "LONGITUDE" = "Longitude", "LATITUDE" = "Latitude"), multi_match_fun = D, mode = "full")
Data
> head(z, n =10)
fires.LATITUDE fires.LONGITUDE fires.DISCOVERY_DATE
1 40.03694 -121.0058 1970-01-29
2 38.93306 -120.4044 1970-01-29
3 38.98417 -120.7356 1970-01-29
4 38.55917 -119.9133 1970-01-29
5 38.55917 -119.9331 1970-01-29
6 38.63528 -120.1036 1970-01-29
7 38.68833 -120.1533 1970-01-29
8 40.96806 -122.4339 1970-01-29
9 41.23361 -122.2833 1970-01-29
10 38.54833 -120.1492 1970-01-29
> head(b, n = 10)
Temps.Latitude Temps.Longitude Temps.date
1 32.95 -100.53 1992-01-01
2 32.95 -100.53 1992-02-01
3 32.95 -100.53 1992-03-01
4 32.95 -100.53 1992-04-01
5 32.95 -100.53 1992-05-01
6 32.95 -100.53 1992-06-01
7 32.95 -100.53 1992-07-01
8 32.95 -100.53 1992-08-01
9 32.95 -100.53 1992-09-01
10 32.95 -100.53 1992-10-01
I would recommend that you come up with an appropriate distance metric based on a weighted combination of temporal distance (i.e. subtracting the dates) and spatial distance (based on lat & long). Determine the weights based on the relative importance of spatial and temporal proximity for your application. Then compute a matrix containing the distance from every point in the first data set to every point in the second data set using this distance metric. Finally, find the minimum distance in each row and/or column to select data points in one dataset that are closest to the points in the other data set. You will probably want to discard any pairs with a distance greater than some threshold.