Search code examples
rdataframemergerows

Merge two datasets with different structure and adding rows


I have two dataframes (expenditure and cabinets) which I want to combine, with different number of columns and rows.

Df expenditure has the variable year complete for each country from 1995 to 2019, while df cabinets misses some data point, i.e I do not have the complete series but some countries have for instance only 1996-1997-1999-2004-2005-2007 others have a different structure (since this is based on election date).

I basically want to add the columns from df cabinets (mostly because I need the variable polarization) to df expenditure but I cannot solve the issue of having different number of rows.

The first has this structure, with 800 obs and 130 variables:

2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 
2013, 2014), country = c("Austria", "Austria", "Austria", "Austria", 
"Austria", "Austria", "Austria", "Austria", "Austria", "Austria", 
"Austria", "Austria", "Austria", "Austria", "Austria", "Austria", 
"Austria", "Austria", "Austria", "Austria"), abv = c("aut", "aut", 
"aut", "aut", "aut", "aut", "aut", "aut", "aut", "aut", "aut", 
"aut", "aut", "aut", "aut", "aut", "aut", "aut", "aut", "aut"
), country_n = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
1, 1, 1, 1, 1), unit = c("million euro", "million euro", "million euro", 
"million euro", "million euro", "million euro", "million euro", 
"million euro", "million euro", "million euro", "million euro", 
"million euro", "million euro", "million euro", "million euro", 
"million euro", "million euro", "million euro", "million euro", 
"million euro"), election_date = c("17/12/95", NA, NA, NA, "03/10/99", 
NA, NA, "24/11/02", NA, NA, NA, "01/10/06", NA, "28/09/08", NA, 
NA, NA, NA, "29/09/13", NA), fract_leg = c(0.71, 0.71, 0.71, 
0.71, 0.71, 0.71, 0.71, 0.65, 0.65, 0.65, 0.65, 0.7, 0.7, 0.77, 
0.77, 0.77, 0.77, 0.77, 0.78, 0.78), enp_leg = c(3.49, 3.49, 
3.49, 3.49, 3.41, 3.41, 3.41, 2.88, 2.88, 2.88, 2.88, 3.37, 3.37, 
4.27, 4.27, 4.27, 4.27, 4.27, 4.59, 4.59)), row.names = c(NA, 
-20L), class = c("tbl_df", "tbl", "data.frame"))```


The second look like this, with 375 obs and 7 columns:

```structure(list(country = c("Austria", "Austria", "Austria", "Austria", 
"Austria", "Austria", "Austria", "Austria", "Austria", "Austria", 
"Austria", "Austria", "Austria", "Austria", "Belgium", "Belgium", 
"Belgium", "Belgium", "Belgium", "Belgium", "Belgium", "Belgium", 
"Belgium", "Belgium", "Belgium", "Belgium", "Belgium", "Belgium", 
"Belgium", "Bulgaria"), abv = c("AUT", "AUT", "AUT", "AUT", "AUT", 
"AUT", "AUT", "AUT", "AUT", "AUT", "AUT", "AUT", "AUT", "AUT", 
"BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BEL", 
"BEL", "BEL", "BEL", "BEL", "BEL", "BEL", "BGR"), year = c(1996, 
1997, 1999, 2000, 2002, 2003, 2005, 2007, 2008, 2013, 2016, 2017, 
2019, 2019, 1995, 1999, 2003, 2007, 2007, 2008, 2008, 2009, 2010, 
2011, 2014, 2014, 2018, 2019, 2019, 1995), election_date = c("17/12/95", 
"17/12/95", "03/10/99", "03/10/99", "24/11/02", "24/11/02", "24/11/02", 
"01/10/06", "28/09/08", "29/09/13", "29/09/13", "15/10/17", "15/10/17", 
"29/09/19", "21/05/95", "13/06/99", "18/05/03", "10/06/07", "10/06/07", 
"10/06/07", "10/06/07", "10/06/07", "13/06/10", "13/06/10", "25/05/14", 
"25/05/14", "25/05/14", "26/05/19", "26/05/19", "18/12/94"), 
    cabinet_name = c("Vranitzky V", "Klima I", "Klima II", "Schuessel I", 
    "Schuessel II", "Schuessel III", "Schuessel IV", "Gusenbauer", 
    "Faymann I", "Faymann II", "Kern", "Kurz I", "Bierlein I", 
    "Bierlein II", "Dehaene II", "Verhofstadt I", "Verhofstadt II", 
    "Verhofstadt III", "Verhofstadt IV", "Leterme I", "Rompuy", 
    "Leterme II", "Leterme III", "Di Rupo", "Di Rupo II", "Michel I", 
    "Michel II", "Michel III", "Wilmes I", "Videnov"), polarization = c("2,744", 
    "2,744", "2,744", "1,8761", "1,8761", "1,8761", "2,3567", 
    "2,744", "2,744", "2,744", "2,744", "1,8761", "caretaker", 
    "caretaker", "2,836", "4,4291", "4,0746", "4,0746", "4,0746", 
    "4,0746", "4,0746", "4,0746", "3,7582", "4,0746", "4,0746", 
    "1,2386", "1,2386", "1,2386", "1,2386", "0")), row.names = c(NA, 
-30L), class = c("tbl_df", "tbl", "data.frame"))```

Solution

  • From what I understand, you want to keep all the rows from expediture and add the information in cabinet. Therefore, you can use dplyr join method:

    output <- left_join(expediture, cabinet, by = c("country", "year")
    

    This will add the columns of cabinet to expediture, and write NAs to the rows where there are no information in cabinet dataframe.