The operation I want to achieve is similar to an Excel-style INDEX MATCH. Essentially, I have a frame:
dta <- read_excel('CPI/Global Inflation New.xlsm',
sheet = 'Cleaned', skip = 1)
colnames(dta)[1] <- 'Date'
dta$Date <- as.Date(dta$Date, format='%d%m%Y')
That looks like this:
| Date US Canada EU Germany
---------------------------------------------------
1 | 2023-03-15 3.1 2.8 0.9 1.3
2 | 2023-04-15 3.3 2.5 1.2 1.5
3 | 2023-05-15 3.2 2.6 1.0 1.4
.
45 | 2024-04-15 2.9 2.7 2.1 2.1
46 | 2024-05-15 NA 2.6 NA 2.3
And I want to create a new frame which has the country headers as one column, and the most recent value for that country in the second column. For example:
| Country Latest
-----------------------------
1 | US 2.9
2 | Canada 2.6
3 | EU 2.1
That 'new frame' so far is:
new_frm <- data.frame(colnames(dta[, -1]))
colnames(new_frm)[1] <- 'Country'
new_frm <- cbind(new_frm, "Latest"=NA)
I'm having trouble figuring out how to extract the relevant values from the dta frame. In Excel I would just pass the country name as a lookup value to the MATCH function, and then return the corresponding value (perhaps for the three most recent dates). Replicating this is R has proven more challenging... not really sure how to go about this. Any advice would be appreciated.
You can get the data in long format, drop the NA
values and get the corresponding value for the latest date available for that country/region.
library(dplyr)
library(tidyr)
dta %>%
pivot_longer(cols = -Date, names_to = "Country", values_to = "Latest",
values_drop_na = TRUE) %>%
summarise(Latest = Latest[which.max(Date)], .by = Country)
# Country Latest
# <chr> <dbl>
#1 US 2.9
#2 Canada 2.6
#3 EU 2.1
#4 Germany 2.3