Search code examples
rtidyverse

Why am I getting zeros when computing growth rates by country-year-sector in my data using tidyverse?


I want to compute a growth rate for each country-year-sector in the following dataset:

> sapply(sa1, class)
     country         year       sector sector_share 
    "factor"    "numeric"     "factor"    "numeric" 
> print(sa1)
               country year        sector sector_share
1   Sub-Saharan Africa 1981   agriculture    15.724457
2   Sub-Saharan Africa 1982   agriculture    16.165780
3   Sub-Saharan Africa 1983   agriculture    15.908671
4   Sub-Saharan Africa 1984   agriculture    17.593971
5   Sub-Saharan Africa 1985   agriculture    19.428871
6   Sub-Saharan Africa 1986   agriculture    19.593291
7   Sub-Saharan Africa 1987   agriculture    19.789807
8   Sub-Saharan Africa 1988   agriculture    20.597277
9   Sub-Saharan Africa 1989   agriculture    19.933259
10  Sub-Saharan Africa 1990   agriculture    19.790467

42  Sub-Saharan Africa 1981      industry    35.516119
43  Sub-Saharan Africa 1982      industry    32.407578
44  Sub-Saharan Africa 1983      industry    32.303477
45  Sub-Saharan Africa 1984      industry    30.437994
46  Sub-Saharan Africa 1985      industry    30.544564
47  Sub-Saharan Africa 1986      industry    29.458658
48  Sub-Saharan Africa 1987      industry    29.490104
49  Sub-Saharan Africa 1988      industry    29.009534
50  Sub-Saharan Africa 1989      industry    29.340000
51  Sub-Saharan Africa 1990      industry    29.698078
52  Sub-Saharan Africa 1991      industry    28.727260

83  Sub-Saharan Africa 1981 manufacturing    18.419694
84  Sub-Saharan Africa 1982 manufacturing    17.895412
85  Sub-Saharan Africa 1983 manufacturing    18.037958
86  Sub-Saharan Africa 1984 manufacturing    16.316419
87  Sub-Saharan Africa 1985 manufacturing    16.256940
88  Sub-Saharan Africa 1986 manufacturing    15.728073
89  Sub-Saharan Africa 1987 manufacturing    15.825253
90  Sub-Saharan Africa 1988 manufacturing    16.320170
91  Sub-Saharan Africa 1989 manufacturing    16.062034
92  Sub-Saharan Africa 1990 manufacturing    16.134401
93  Sub-Saharan Africa 1991 manufacturing    15.826331

124 Sub-Saharan Africa 1981      services    44.946512
125 Sub-Saharan Africa 1982      services    46.323757
126 Sub-Saharan Africa 1983      services    46.071141
127 Sub-Saharan Africa 1984      services    45.820815
128 Sub-Saharan Africa 1985      services    43.226268
129 Sub-Saharan Africa 1986      services    43.409858
130 Sub-Saharan Africa 1987      services    44.298582
131 Sub-Saharan Africa 1988      services    43.191570
132 Sub-Saharan Africa 1989      services    43.023115
133 Sub-Saharan Africa 1990      services    44.043939
134 Sub-Saharan Africa 1991      services    44.995853


I use the following code:

sa1 <- sa1 %>%
  group_by(country, year, sector) %>%
  arrange(year) %>%
  mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))

But I obtain zeros, which should not be since the are no NAs in the sector_share column.

> print(sa1)
# A tibble: 164 × 5
# Groups:   country, year, sector [164]
   country             year sector        sector_share growth_rate
   <fct>              <dbl> <fct>                <dbl>       <dbl>
 1 Sub-Saharan Africa  1981 agriculture           15.7           0
 2 Sub-Saharan Africa  1981 industry              35.5           0
 3 Sub-Saharan Africa  1981 manufacturing         18.4           0
 4 Sub-Saharan Africa  1981 services              44.9           0
 5 Sub-Saharan Africa  1982 agriculture           16.2           0
 6 Sub-Saharan Africa  1982 industry              32.4           0
 7 Sub-Saharan Africa  1982 manufacturing         17.9           0
 8 Sub-Saharan Africa  1982 services              46.3           0
 9 Sub-Saharan Africa  1983 agriculture           15.9           0
10 Sub-Saharan Africa  1983 industry              32.3           0
# ℹ 154 more rows
# ℹ Use `print(n = ...)` to see more rows

I tried to compute the growth rate, but I obtain zeros. It does not make sense since my data has no NAs in the sector_share column and I am doing a check even in the code just in case.

Can someone help me? Thank you!


Solution

  • Since you’re grouping by year, your computation only “sees” one year at a time, making it impossible to compute growth across multiple years. So don’t group by year:

    
    library(dplyr)
    
    sa1 %>%
      group_by(country, sector) %>%
      arrange(year) %>%
      mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))
    
    # A tibble: 43 × 5
    # Groups:   country, sector [4]
       country  year sector        sector_share growth_rate
       <chr>   <int> <chr>                <dbl>       <dbl>
     1 Africa   1981 agriculture           15.7       0    
     2 Africa   1981 industry              35.5       0    
     3 Africa   1981 manufacturing         18.4       0    
     4 Africa   1981 services              44.9       0    
     5 Africa   1982 agriculture           16.2       2.81 
     6 Africa   1982 industry              32.4      -8.75 
     7 Africa   1982 manufacturing         17.9      -2.85 
     8 Africa   1982 services              46.3       3.06 
     9 Africa   1983 agriculture           15.9      -1.59 
    10 Africa   1983 industry              32.3      -0.321
    # ℹ 33 more rows