I have the dataset that consists of the following columns: Company Name, ISIN, Profit2008, Profit2009, Profit2010, Profitf2008, Profitf2009, Profitf2010, Sales2008, Sales2008, Sales2009, Sales2010, Salesf2008, Salesf2009, Salesf2010.
CName ISIN Profit2008 Profit2009 Profit2010 Profitf2008 Profitf2009 Profitf2010 Sales2008 Sales2009 Sales2010 Salesf2008 Salesf2009 Salesf2010
Name1 123 34500 4000 4043 92407 127138 150449 202926 2000 34500 34500 34500 34500
Name2 456 50000 2000 1000 33588 34116 33105 33265 44545 4000 4000 4000 4000
Basically, the dataset is for the period of 2008-2010 calendar years and 2008-2010 fiscal years. Is there a way to make it look like this: Company Name, ISIN, Year, Profit, Sales
The way I want it to look like is this:
CName ISIN Year Profit Sales
Name1 123 2008 34500 202926
Name1 123 2009 4000 2000
Name1 123 2010 4043 34500
Name1 123 f2008 92407 34500
Name1 123 f2009 127138 34500
Name1 123 f2010 150449 34500
Name2 456 2008
Name2 456 2009
Name2 456 2010
Name2 456 f2008
Name2 456 f2009
Name2 456 f2010(etc)
Any help is appreciated. Thank you!
Using dplyr:
library(dplyr)
library(tidyr)
df = df %>%
pivot_longer(cols = -c(1,2),
names_to = c('type', 'year'),
names_pattern = "([A-Za-z]+)(\\d+)",
values_to = 'amount') %>%
pivot_wider(names_from = 'type', values_from = 'amount')
Produces the following output:
df
# A tibble: 6 x 7
CName ISIN year Profit Profitf Sales Salesf
<fct> <int> <chr> <int> <int> <int> <int>
1 Name1 123 2008 34500 92407 202926 34500
2 Name1 123 2009 4000 127138 2000 34500
3 Name1 123 2010 4043 150449 34500 34500
4 Name2 456 2008 50000 33588 33265 4000
5 Name2 456 2009 2000 34116 44545 4000
6 Name2 456 2010 1000 33105 4000 4000
Explanation of code:
pivot_longer
is used to well transform the data from wide to long. Except the first and second column, all other column names are converted into row names under the column type
and the respective values are put under column amount
. The values under the column type
are then splitted using names_pattern
argument to extract the year and placed under the year column. As an output you get the following:# A tibble: 24 x 5
CName ISIN type year amount
<fct> <int> <chr> <chr> <int>
1 Name1 123 Profit 2008 34500
2 Name1 123 Profit 2009 4000
3 Name1 123 Profit 2010 4043
4 Name1 123 Profitf 2008 92407
5 Name1 123 Profitf 2009 127138
6 Name1 123 Profitf 2010 150449
7 Name1 123 Sales 2008 202926
8 Name1 123 Sales 2009 2000
9 Name1 123 Sales 2010 34500
10 Name1 123 Salesf 2008 34500
# ... with 14 more rows
pivot_wider
to make the dataframe wide again based on the values in column type
. So each distinct value in type
becomes a separate column and the respective amount
values are taken from amount
column. And this gives the final required output.