rdatasetregressiontibblepanel-data# Regression with Panel Data (Wide)?

I am trying to run the following model of GDP on population:

```
GDP_(i,t) = alpha + beta*Population_(i,t) + epsilon
```

Here, each variable is indexed by time (t) and country (i).

I have a panel dataset **df1** in a wide format like this:

```
UK_gdp <- c(4.1, 4.2, 3.8, 4.0)
US_gdp <- c(4.1, 4.2, 3.8, 4.0)
US_pop <- c(220, 230, 240, 260)
UK_pop <- c(40, 45, 47, 49)
year <- c("1965-01-01", "1966-01-01", "1967-01-01", "1968-01-01")
df1 <- tibble(UK_gdp, US_gdp, US_pop, UK_pop, year)
```

I would like to run the above regression using the columns UK_gdp, US_gdp for the *GDP_(i,t)* variable, and the columns US_pop, UK_pop as data for the *population_(i,t)* variable.
Is there any way to use the data for both countries in a regression? I do not want to run separate regressions for each country, but instead include all the data in the model when run the regression. I am not sure how to do this.

Solution

First, you want to `reshape`

your data into long format.

```
> df1$year <- strftime(df1$year, '%Y') ## this leaves just year from the date
> df1_l <- reshape(df1, varying=list(c("UK_gdp", "US_gdp"), c("US_pop", "UK_pop")),
+ v.names=c('gdp', 'pop'), times=c('UK', 'US'), timevar='country',
+ idvar='year', direction='long') |> `rownames<-`(NULL)
> df1_l
year country gdp pop
1 1965 UK 4.1 220
2 1966 UK 4.2 230
3 1967 UK 3.8 240
4 1968 UK 4.0 260
5 1965 US 4.1 40
6 1966 US 4.2 45
7 1967 US 3.8 47
8 1968 US 4.0 49
```

The equation you show, is actually just an OLS regression, pooled across all entities and time periods.

```
GDP(i,t) = alpha + beta * Population(i,t) + epsilon
```

```
> fit1 <- lm(gdp ~ pop, df1_l)
> summary(fit1)$coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.0338312297 0.1068459675 37.7537059 2.305458e-08
pop -0.0000624667 0.0006237552 -0.1001462 9.234907e-01
```

However, a better idea might be to use `"country"`

as a fixed effect, i.e.

```
GDP(i,t) = alpha_i + beta * Population(i,t) + epsilon(i,t)
```

```
> fit2 <- lfe::felm(gdp ~ pop | country, df1_l)
> summary(fit2)$coefficients
Estimate Std. Error t value Pr(>|t|)
pop -0.005082903 0.005734687 -0.8863435 0.4160214
```

Since error terms of countries are correlated, you probably should use clustered standard errors,

```
> fit3 <- lfe::felm(gdp ~ pop | country | 0 | country, df1_l)
> summary(fit3)$coefficients
Estimate Cluster s.e. t value Pr(>|t|)
pop -0.005082903 0.001638335 -3.10248 0.1985035
```

Finally, there might also be time trends (year effects), which would look like this (not working with this small example dataset):

```
GDP(i,t) = alpha_i + gamma_t + beta * Population(i,t) + epsilon(i,t)
```

```
> fit4 <- lfe::felm(gdp ~ pop | country + year | 0 | country, df1_l)
```

*Data:*

```
> dput(df1)
structure(list(UK_gdp = c(4.1, 4.2, 3.8, 4), US_gdp = c(4.1,
4.2, 3.8, 4), US_pop = c(220, 230, 240, 260), UK_pop = c(40,
45, 47, 49), year = c("1965-01-01", "1966-01-01", "1967-01-01",
"1968-01-01")), class = "data.frame", row.names = c(NA, -4L))
```

- Dynamic name with glue in mutate call
- Auto adjust in the middle text in bar
- Piping stdin to R
- Loosing names for x values on x-axis in parallel sets chart with ggforce in R
- Installing .otf Font in R
- How can I fill the empty cells in each row of a dataframe with the value in the last non-empty cell in that row?
- Getting namespace from a obo-formatted text file
- Filter data frame by character column name (in dplyr)
- Use Map function with the base R pipe |>
- Using Plotly Mapboxes Offline in R
- R: changing x-axis tick mark labels in forestplot
- Loop in R to read many files
- R or Python, Is it possible to get the necessary sample size in a proportions power test calculation with ratio of sample sizes fixed?
- Reorder values across rows and convert duplicates to NA in R
- ggplot2: How to dynamically wrap/resize/rescale x axis labels so they won't overlap
- How to fit stat_summary to rectangle in ggplot?
- Can ggfittext automatically choose rotation for text?
- In `ggfittext::geom_fit_text` , it seems can't use `vjust`
- Only collect partial of legends with patchwork
- Mutate columns based on value in a vector
- Add multiple output variables using purrr and a predefined function
- One shared legend for a cowplot grid in R
- Values not appearing in ggplot plot
- Fill first (only one) NA with next non-NA value by group using dplyr/tidyr
- use two different viridis colour palettes in a single plot
- pivot_wider issue "Values in `values_from` are not uniquely identified; output will contain list-cols"
- download multiple files from a website, then unzip every file into the current working directory
- How to reassign values based on grouping and dates (R)
- Trouble Scraping Dynamic Content from SER-SID.org with rvest in R
- unable to translate '...' to a wide string