Search code examples
rformatpanel-data

Long to wide change in panel data, but only for certain values in rows


I've browsed extensively online but could so far not find an appropriate answer for my question in this specific case.

I'm looking to partly re-structure a panel data set from long to wide format, but only for specific values that are specified by their respective names/characters in rows in R.

Consider this original format:

          SERIES       ECONOMY      YEAR     Value
246        CPI         Panama       1960     0.05
247        CPI         Peru         1960     0.05
248        CPI         XXXXXX       1960     0.05
249        CPI         Panama       1961     0.06
250        CPI         Peru         1961     0.06
251        CPI         XXXXXX       1961     0.06
252   % Gross savings  Panama       1960     5
253   % Gross savings  Peru         1960     6
254   % Gross savings  XXXXXX       1960     7
255   % Gross savings  Panama       1961     20
256   % Gross savings  Peru         1961     21
257   % Gross savings  XXXXXX       1961     22

(And so on for different countries, different indicators in the "SERIES" column, during 1960-2020 for each country and indicator.)

I'm looking to keep "ECONOMY" as its own column specifying the country as originally seen, keep the year as a column as well, but move each separate indicator under SERIES (e.g. CPI / % Gross savings) into their own columns like this:

          ECONOMY       YEAR      CPI      %_GROSS_SAVINGS
1         Panama        1960      0.05     5
2         Peru          1960      0.05     6
3         XXXXXX        1960      0.05     7
4         Panama        1961      0.06     20
5         Peru          1961      0.06     21
6         XXXXXX        1961      0.06     22

Any ideas? Grateful for answers.


Solution

  • reshape2

    reshape2::dcast(ECONOMY + YEAR ~ SERIES, data = zz)
    # Using Value as value column: use value.var to override.
    #   ECONOMY YEAR %_Gross_savings  CPI
    # 1  Panama 1960               5 0.05
    # 2  Panama 1961              20 0.06
    # 3    Peru 1960               6 0.05
    # 4    Peru 1961              21 0.06
    # 5  XXXXXX 1960               7 0.05
    # 6  XXXXXX 1961              22 0.06
    

    Data

    zz <- structure(list(SERIES = c("CPI", "CPI", "CPI", "CPI", "CPI", "CPI", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings", "%_Gross_savings"), ECONOMY = c("Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX", "Panama", "Peru", "XXXXXX"), YEAR = c(1960L, 1960L, 1960L, 1961L, 1961L, 1961L, 1960L, 1960L, 1960L, 1961L, 1961L, 1961L), Value = c(0.05, 0.05, 0.05, 0.06, 0.06, 0.06, 5, 6, 7, 20, 21, 22)), class = "data.frame", row.names = c("246",  "247", "248", "249", "250", "251", "252", "253", "254", "255", "256", "257"))