Search code examples
raveragepanelpanel-data

Compute Average values in panel data R


I am quite new to R. I have unbalanced panel data. BvD_ID_Number is the identification number for each firm, TotalAsset is the value for Total Assets from Balance sheets, in each period of time (Year). Here an overview:

structure(list(BvD_ID_Number = c("FR810911719", "FR810911719", 
"GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699", 
"GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016, 
2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220, 
1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
))

I want, for each BvD_ID_Number, to compute the average value of TotalAsset for the entire available time window.

I used this code, but it doesn't work well:

library(dplyr)
df <- 
p_TotalAsset1 %>% 
  group_by(p_TotalAsset1$BvD_ID_Number) %>% 
   mutate(TotalAsset_Avegage = round(mean(p_TotalAsset1$TotalAsset)))

Thanks for helping me.


Solution

  • You can use summarize or mutate:

    Using summarize will fully aggregate your data to provide only the grouping variable (ID number for each firm) and the mean.

    df %>% 
      group_by(BvD_ID_Number) %>% 
      summarize(TotalAsset_Average = round(mean(TotalAsset),0))
    

    This gives us:

      BvD_ID_Number TotalAsset_Average
      <chr>                      <dbl>
    1 FR810911719                1697 
    2 GBFC024701               124191 
    3 GBFC032748                 6318.
    4 GBFC32536                 72912 
    5 GBFC32699                 92878.
    

    Using mutate

    df %>% 
      group_by(BvD_ID_Number) %>% 
      mutate(TotalAsset_Average = round(mean(TotalAsset),0))
    

    This gives us:

    # A tibble: 10 x 4
    # Groups:   BvD_ID_Number [5]
       BvD_ID_Number  Year TotalAsset TotalAsset_Average
       <chr>         <dbl>      <dbl>              <dbl>
     1 FR810911719    2017       2220               1697
     2 FR810911719    2016       1174               1697
     3 GBFC024701     2018     124726             124191
     4 GBFC024701     2017     126010             124191
     5 GBFC024701     2016     121837             124191
     6 GBFC32536      2017      72912              72912
     7 GBFC32699      2016     111298              92878
     8 GBFC32699      2015      74457              92878
     9 GBFC032748     2017       6579               6318
    10 GBFC032748     2016       6056               6318
    

    Data:

    structure(list(BvD_ID_Number = c("FR810911719", "FR810911719", 
    "GBFC024701", "GBFC024701", "GBFC024701", "GBFC32536", "GBFC32699", 
    "GBFC32699", "GBFC032748", "GBFC032748"), Year = c(2017, 2016, 
    2018, 2017, 2016, 2017, 2016, 2015, 2017, 2016), TotalAsset = c(2220, 
    1174, 124726, 126010, 121837, 72912, 111298, 74457, 6579, 6056
    )), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"
    ))