Search code examples
rggplot2graphpanel-data

How to graph a lagged variable in a panel data r


I got a dataset of firms (unbalanced panel) that goes like this:

id   year   tfp    c_sales    
A    2012    1.52   14.56
A    2013    1.82   15.6
A    2014    1.67   16.3
A    2015    1.72   18.36
...   ...    ...    ...
B    2012    1.58   17.56
B    2013    1.83   12.6
B    2014    1.62   19.3
B    2015    1.96   14.36
...   ...    ...    ... 
C    2012    1.2   13.4
C    2013    1.6   16.3
...   ...    ...    ...

And so on... till 2019.

How can I plot tfp from 2014 vs c_sales in 2015?

I want to have a scatter plot, that in the horizontal axis shows me the tfp values for 2014 and in the vertical axis shows me the c_sales values of 2015.

Since tfp is a measure of productivity I'd like to see a scatter plot, that tells me that firms that were productive in 2014, had greater or lesser sales in 2015.

I was trying to make a plot with ggplot, but I don't have a clear idea of how to do it.

(Additionally, how can I make a regression like that? with a year-fixed independent variable)


Solution

  • You can do like this

    (Although the data would be really useful!)

    library(tidyverse)
    
    df=tribble(
    ~id, ~year, ~tfp, ~c_sales, 
    "A", 2012, 1.52, 14.56, 
    "A", 2013, 1.82, 15.6, 
    "A", 2014, 1.67, 16.3, 
    "A", 2015, 1.72, 18.36, 
    "B", 2012, 1.58, 17.56, 
    "B", 2013, 1.83, 12.6, 
    "B", 2014, 1.62, 19.3, 
    "B", 2015, 1.96, 14.36, 
    "C", 2012, 1.2, 13.4, 
    "C", 2013, 1.6, 16.3, 
    "C", 2014, 1.7, 17.3, 
    "C", 2015, 1.82, 20.33
    ) 
    
    f = function(data, group, xYear, yYear)(
      tibble(
        xYear = xYear,
        yYear = yYear,
        tfp = data %>% filter(year==xYear) %>% pull(tfp),
        c_sales = data %>% filter(year==yYear) %>% pull(c_sales)
      )
    )
    
    
    df = df %>% 
      group_by(id) %>% 
      group_modify(f, xYear=2014, yYear=2015) 
    
    df
    
    

    output

    # A tibble: 3 x 5
    # Groups:   id [3]
      id    xYear yYear   tfp c_sales
      <chr> <dbl> <dbl> <dbl>   <dbl>
    1 A      2014  2015  1.67    18.4
    2 B      2014  2015  1.62    14.4
    3 C      2014  2015  1.7     20.3
    

    And next

    df %>% ggplot(aes(tfp, c_sales))+
      geom_point()
    

    enter image description here