Search code examples
rmultiple-columnsstore

How to store the values that match the condition in a separate column in the dataframe in R?


I have the dataset that looks like this:


    ISIN        Variable    Value
    182         Sales2008   276
    182         Sales2009   800
    182         Sales2010   900
    182         Sales2011   273
    182         Sales2012   276
    182         Sales2013   123
    182         Sales2014   345
    182         Sales2015   444
    182         Sales2016   666
    182         Sales2017   777
    182         Sales2018   999
    182         Sales2019   235
    182         Salesf2008   276
    182         Salesf2009   800
    182         Salesf2010   900
    182         Salesf2011   273
    182         Salesf2012   276
    182         Salesf2013   123
    182         Salesf2014   345
    182         Salesf2015   444
    182         Salesf2016   666
    182         Salesf2017   777
    182         Salesf2018   999
    182         Salesf2019   235
    182         ROA2008      333
    182         ROA2009      333
    182         ROA2010      333
    182         ROA2011      333
    182         ROA2012      333
    182         ROA2013      333
    182         ROA2014      333
    182         ROA2015      333
    182         ROA2016      333
    182         ROA2017      333
    182         ROA2018      333
    182         ROA2019      333

Do you have any suggestion how to create a new df where the values from Sales2008 to Sales2019 would be stored in one column? Also, the values for Salesf2008 to Salesf2019 would be stored in the next column, and the values for ROA2008 to ROA2019 in the next column?

Thanks in advance!


Solution

  • Hard to say without your expected output, but it sounds like you want this. This uses extract() to make two columns by extracting the 1) letters and 2) 4 digits, and pivot_wider() to go from long to wide using the new column: "Col" and your "Values" column. Hope it helps!!

    library(tidyr)
    library(dplyr) # for %>%
    
    df1 %>%
      extract(Variable, c("Col", "Year"), "([A-Za-z]+)(\\d{4})", convert = TRUE) %>% 
      pivot_wider(names_from = Col, values_from = Value) 
    
    # A tibble: 12 x 5
        ISIN  Year  Sales Salesf   ROA
       <int> <int> <int>  <int> <int>
     1   182 2008    276    276   333
     2   182 2009    800    800   333
     3   182 2010    900    900   333
     4   182 2011    273    273   333
     5   182 2012    276    276   333
     6   182 2013    123    123   333
     7   182 2014    345    345   333
     8   182 2015    444    444   333
     9   182 2016    666    666   333
    10   182 2017    777    777   333
    11   182 2018    999    999   333
    12   182 2019    235    235   333
    

    Data:

    df1 <- read.table(header = TRUE, text = "ISIN        Variable    Value
    182         Sales2008   276
    182         Sales2009   800
    182         Sales2010   900
    182         Sales2011   273
    182         Sales2012   276
    182         Sales2013   123
    182         Sales2014   345
    182         Sales2015   444
    182         Sales2016   666
    182         Sales2017   777
    182         Sales2018   999
    182         Sales2019   235
    182         Salesf2008   276
    182         Salesf2009   800
    182         Salesf2010   900
    182         Salesf2011   273
    182         Salesf2012   276
    182         Salesf2013   123
    182         Salesf2014   345
    182         Salesf2015   444
    182         Salesf2016   666
    182         Salesf2017   777
    182         Salesf2018   999
    182         Salesf2019   235
    182         ROA2008      333
    182         ROA2009      333
    182         ROA2010      333
    182         ROA2011      333
    182         ROA2012      333
    182         ROA2013      333
    182         ROA2014      333
    182         ROA2015      333
    182         ROA2016      333
    182         ROA2017      333
    182         ROA2018      333
    182         ROA2019      333")