Search code examples
rreshapepanel-data

How to reshape the data in R?


I have the dataset that consists of the following columns: Company Name, ISIN, Profit2008, Profit2009, Profit2010, Profitf2008, Profitf2009, Profitf2010, Sales2008, Sales2008, Sales2009, Sales2010, Salesf2008, Salesf2009, Salesf2010.

CName ISIN Profit2008 Profit2009 Profit2010 Profitf2008 Profitf2009 Profitf2010 Sales2008 Sales2009 Sales2010 Salesf2008 Salesf2009 Salesf2010
Name1 123  34500       4000      4043       92407        127138     150449       202926    2000      34500    34500       34500        34500       
Name2 456  50000       2000      1000       33588        34116      33105        33265     44545     4000      4000       4000         4000     

Basically, the dataset is for the period of 2008-2010 calendar years and 2008-2010 fiscal years. Is there a way to make it look like this: Company Name, ISIN, Year, Profit, Sales

The way I want it to look like is this:

CName   ISIN Year     Profit  Sales 
Name1   123  2008     34500   202926    
Name1   123  2009     4000    2000      
Name1   123  2010     4043    34500    
Name1   123  f2008    92407   34500       
Name1   123  f2009    127138  34500        
Name1   123  f2010    150449  34500        
Name2   456  2008     
Name2   456  2009     
Name2   456  2010
Name2   456  f2008
Name2   456  f2009
Name2   456  f2010(etc)

Any help is appreciated. Thank you!


Solution

  • Using dplyr:

    library(dplyr)
    library(tidyr)
    df = df %>% 
       pivot_longer(cols = -c(1,2),
                               names_to = c('type', 'year'),
                               names_pattern = "([A-Za-z]+)(\\d+)",
                               values_to = 'amount') %>%
       pivot_wider(names_from = 'type', values_from = 'amount')
    

    Produces the following output:

    df
    # A tibble: 6 x 7
      CName  ISIN year  Profit Profitf  Sales Salesf
      <fct> <int> <chr>  <int>   <int>  <int>  <int>
    1 Name1   123 2008   34500   92407 202926  34500
    2 Name1   123 2009    4000  127138   2000  34500
    3 Name1   123 2010    4043  150449  34500  34500
    4 Name2   456 2008   50000   33588  33265   4000
    5 Name2   456 2009    2000   34116  44545   4000
    6 Name2   456 2010    1000   33105   4000   4000
    

    Explanation of code:

    1. pivot_longer is used to well transform the data from wide to long. Except the first and second column, all other column names are converted into row names under the column type and the respective values are put under column amount. The values under the column type are then splitted using names_pattern argument to extract the year and placed under the year column. As an output you get the following:
    # A tibble: 24 x 5
       CName  ISIN type    year  amount
       <fct> <int> <chr>   <chr>  <int>
     1 Name1   123 Profit  2008   34500
     2 Name1   123 Profit  2009    4000
     3 Name1   123 Profit  2010    4043
     4 Name1   123 Profitf 2008   92407
     5 Name1   123 Profitf 2009  127138
     6 Name1   123 Profitf 2010  150449
     7 Name1   123 Sales   2008  202926
     8 Name1   123 Sales   2009    2000
     9 Name1   123 Sales   2010   34500
    10 Name1   123 Salesf  2008   34500
    # ... with 14 more rows
    
    1. This dataframe is then passed to pivot_wider to make the dataframe wide again based on the values in column type. So each distinct value in type becomes a separate column and the respective amount values are taken from amount column. And this gives the final required output.