My data set contains values for multiple scenarios (S1, S2, S3) across multiple years, for a linking variable called link_col
.
link_col <- c("A0001", "A0002", "A0003")
S1_ABC_2019 <- c(1000, 300, 5000)
S1_ABC_2020 <- c(1000, 300, 5000)
S1_ABC_2021 <- c(1000, 300, 5000)
S2_ABC_2012 <- c(800, 600, 2000)
S2_ABC_2013 <- c(800, 600, 2000)
S2_ABC_2019 <- c(800, 600, 2000)
S3_ABC_2013 <- c(600, 100, 900)
S3_ABC_2020 <- c(600, 100, 900)
S3_ABC_2021 <- c(600, 100, 900)
data <- data.frame(link_col, S1_ABC_2019, S1_ABC_2020, S1_ABC_2021, S2_ABC_2012,
S2_ABC_2013, S2_ABC_2019, S3_ABC_2013, S3_ABC_2020, S3_ABC_2021)
Using dplyr
want to pivot_longer()
this dataset by condensing these wide columns into one each for scenario, year, and value, to make figures in ggplot. This is what I'm after:
link_col Scenario Year Value
1 A0001 S1 2019 1000
2 A0002 S1 2020 300
3 A0003 S1 2021 5000
4 A0001 S2 2012 1000
5 A0002 S2 2013 300
6 A0003 S2 2019 5000
7 A0001 S3 2013 1000
8 A0002 S3 2020 300
9 A0003 S3 2021 5000
10 A0001 S1 2019 800
11 A0002 S1 2020 600
12 A0003 S2 2013 2000
13 A0001 S2 2019 800
14 A0002 S3 2015 600
15 A0003 S3 2021 2000
16 A0001 S1 2019 800
17 A0002 S1 2020 600
18 A0003 S1 2021 2000
19 A0001 S2 2012 600
20 A0002 S2 2013 100
21 A0003 S2 2019 900
22 A0001 S3 2013 600
23 A0002 S3 2020 100
24 A0003 S3 2021 900
25 A0001 S1 2019 600
26 A0002 S1 2020 100
27 A0003 S2 2013 900
28 A0001 S2 2019 1100
29 A0002 S3 2015 400
30 A0003 S3 2021 4800
31 A0001 S1 2019 1100
32 A0002 S1 2020 400
33 A0003 S1 2021 4800
34 A0001 S2 2012 900
35 A0002 S2 2013 600
36 A0003 S2 2019 2100
37 A0001 S3 2013 900
38 A0002 S3 2020 600
39 A0003 S3 2021 2100
40 A0001 S1 2019 500
41 A0002 S1 2020 200
42 A0003 S2 2013 800
43 A0001 S2 2019 500
44 A0002 S3 2015 200
45 A0003 S3 2021 800
I've tried this but it doesn't work, and I don't really understand the use of the names_pattern
parameter. Any help would be appreciated!
data2 <- pivot_longer(data,
cols = link_col, starts_with("S"),
names_to = c("Scenario", "Year"),
names_pattern = "S(\\d+)_ABC_2(\\d+)",
values_to = Value)
You were pretty close:
library(tidyr)
pivot_longer(data, cols = -link_col,
names_pattern = "(S\\d+)_ABC_(2\\d+)",
names_to = c("Scenario", "Year"),
values_to = "Value")
output
# # A tibble: 27 × 4
# link_col Scenario Year Value
# <chr> <chr> <chr> <dbl>
# 1 A0001 S1 2019 1000
# 2 A0001 S1 2020 1000
# 3 A0001 S1 2021 1000
# 4 A0001 S2 2012 800
# 5 A0001 S2 2013 800
# 6 A0001 S2 2019 800
# 7 A0001 S3 2013 600
# 8 A0001 S3 2020 600
# 9 A0001 S3 2021 600
# 10 A0002 S1 2019 300
# # ℹ 17 more rows
# # ℹ Use `print(n = ...)` to see more rows
Some explanations:
names_pattern
, the parentheses are used to capture groups, and names_to
will name those groups. You want one group for the Scenario, which is captured by including the "S", and your second group is for the years.cols
argument needs all columns to pivot into a longer format, so here it's everything but the link_col
.