I have a dataset with 3 different indices, and I need the difference between HPI 1 and HPI2 and the difference between HPI2 and HPI3, both for rent and sale. The first thing that comes to my mind is to transform this DF into one with one column per HPI and then create new columns with a simple subtraction, but I don't know how to do it. The other idea is to rest the values from the same column conditioning the operation and the date. This is an example of the data
~Index, ~Value, ~Operation, ~Year,
"HPI1", 0.9, "Sale", "2017",
"HPI2", 1.1, "Sale", "2017",
"HPI3", 0.89, "Sale", "2017",
"HPI1", 1.12, "Rent", "2017",
"HPI2", 0.85, "Rent", "2017",
"HPI3", 1.22, "Rent", "2017",
"HPI1", 0.91, "Sale", "2018",
"HPI2", 1.02, "Sale", "2018",
"HPI3", 0.9, "Sale", "2018",
"HPI1", 1.1, "Rent", "2018",
"HPI2", 0.89, "Rent", "2018",
"HPI3", 1.12, "Rent", "2018",)
I would be very thankful for any help. Thanks!
Here is another way:
library(dplyr)
library(tidyr)
df <- tribble(
~Index, ~Value, ~Operation, ~Year,
"HPI1", 0.9, "Sale", "2017",
"HPI2", 1.1, "Sale", "2017",
"HPI3", 0.89, "Sale", "2017",
"HPI1", 1.12, "Rent", "2017",
"HPI2", 0.85, "Rent", "2017",
"HPI3", 1.22, "Rent", "2017",
"HPI1", 0.91, "Sale", "2018",
"HPI2", 1.02, "Sale", "2018",
"HPI3", 0.9, "Sale", "2018",
"HPI1", 1.1, "Rent", "2018",
"HPI2", 0.89, "Rent", "2018",
"HPI3", 1.12, "Rent", "2018")
df %>%
group_by(Year, Operation) %>%
pivot_wider(names_from = Index, values_from = Value) %>%
mutate(HPI1_HPI2_diff = HPI1 - HPI2,
HPI2_HPI3_diff = HPI2 - HPI3)
# A tibble: 4 x 7
# Groups: Year, Operation [4]
Operation Year HPI1 HPI2 HPI3 HPI1_HPI2_diff HPI2_HPI3_diff
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Sale 2017 0.9 1.1 0.89 -0.2 0.21
2 Rent 2017 1.12 0.85 1.22 0.27 -0.37
3 Sale 2018 0.91 1.02 0.9 -0.110 0.12
4 Rent 2018 1.1 0.89 1.12 0.21 -0.23