I have a dataset of this kind df:
df <- as.data.frame(matrix(c(1,1,1,1,2,2,
2020,2020,2020,2020,2020,2020,
1,2,1,2,3,6,
"Spain","Spain","France","France","Germany","Japan",
0,40,5,200,98,300,
1,3,2,19,4,11), ncol = 6))
names(df) <- c('ID', 'year', 'month', 'country', 'buy', 'sell')
I want to transform df. Keeping ID, year and month as reference variables and based on country I am trying to convert the rows into new variables, in the following way df_modified:
df_modified <- as.data.frame(matrix(c(1,1,2,2,
2020,2020,2020,2020,
1,2,3,6,
0,40,"NA","NA",
5,200,"NA","NA",
1,3,"NA","NA",
2,19,"NA","NA",
"NA","NA",98,300,
"NA","NA",4,11), ncol = 9))
names(df_modified) <- c('ID', 'year', 'month', 'buy_Spain', 'sell_Spain', 'buy_France','sell_France', 'buy_Germany', 'sell_Germany')
I have tried:
library(reshape2)
df_modified <- dcast(df, id+year+month ~ country)
However it does not make the trick.
Note: Numeric variables are numeric in my original dataset (country is character, etc).
Any clue?
Regards
We can specify the value.var
in dcast
library(data.table)
dcast(setDT(df), ID + year + month ~ country, value.var = c('buy', 'sell'))
Or using pivot_wider
library(tidyr)
library(dplyr)
df %>%
pivot_wider(names_from = country, values_from = c(buy, sell),
values_fill = list(buy = '0', sell = '0'))
# A tibble: 4 x 11
# ID year month buy_Spain buy_France buy_Germany buy_Japan sell_Spain sell_France sell_Germany sell_Japan
# <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#1 1 2020 1 0 5 0 0 1 2 0 0
#2 1 2020 2 40 200 0 0 3 19 0 0
#3 2 2020 3 0 0 98 0 0 0 4 0
#4 2 2020 6 0 0 0 300 0 0 0 11
If we need to convert to numeric class, just do type.convert
df %>%
type.convert(as.is = TRUE) %>%
pivot_wider(names_from = country, values_from = c(buy, sell),
values_fill = list(buy = 0, sell = 0))
# A tibble: 4 x 11
# ID year month buy_Spain buy_France buy_Germany buy_Japan sell_Spain sell_France sell_Germany sell_Japan
# <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
#1 1 2020 1 0 5 0 0 1 2 0 0
#2 1 2020 2 40 200 0 0 3 19 0 0
#3 2 2020 3 0 0 98 0 0 0 4 0
#4 2 2020 6 0 0 0 300 0 0 0 11