I have a dataframe including around 300 columns. These columns are grouped. Here is the example:
id name fixed_2020 fixed_2019 fixed_2018 fixed_2017 fixed_2016 current_2020 current_2019 current_2018 current_2017 current_2016
1 A 2300 2100 2600 2600 1900 3000 3100 3200 3300 3400
I have around 20000 rows in total in the dataframe. And I would like to reshape this dataframe from wide to long in R. I tried to use function melt:
fixed <- melt(mydata, id.vars = c('id', 'name'), measure.vars = 3:7, variable.name = 'year', value.name = 'fixed')
Then I use gsub to get the year column
fixed$year <- as.character(gsub("[^0-9-]", "", debtors$year))
Here is what I want:
id name year fixed current
1 A 2020 2030 3000
2 A 2019 2100 3100
3 A 2018 2600 3200
4 A 2017 2600 3300
5 A 2016 1900 3400
While it does give me what I want, the process is time-consuming when I have more than 300 columns. Since my dataframe is grouped according to the variables and years (10 years for each variable), I wonder whether there are other functions to do it more efficiently.
Thanks a lot in advance!
Using data.table
:
library(data.table)
setDT(mydata)
result <- melt(mydata, id=c('id', 'name'),
measure.vars = patterns(fixed='fixed_', current='current_'),
variable.name = 'year')
years <- as.numeric(gsub('.+_(\\d+)', '\\1', grep('fixed', names(mydata), value = TRUE)))
result[, year:=years[year]]
result[, id:=seq(.N), by=.(name)]
result
## id name year fixed current
## 1: 1 A 2020 2300 3000
## 2: 2 A 2019 2100 3100
## 3: 3 A 2018 2600 3200
## 4: 4 A 2017 2600 3300
## 5: 5 A 2016 1900 3400
This should be very fast but your data-set is not very big tbh.
Note that this assumes the fixed and current columns are in the same order and associated with the same year(s). So if there is a fixed_2020
as the first fixed_*
column, there is also a current_2020
as the first current_*
column, and so on. Otherwise, the year
column will correctly associate with fixed
but not current