Search code examples
rdataframetabular

Creating a cross-tabulated table from data frame in R


I have a data frame like this:

df <- data.frame(Country = rep(c("US","CA"),each=3),
                 Variable = c("Inflation","Unemployment","Interest rate"),
                 Month = rnorm(6), Quarter = rnorm(6)+2, Year=rnorm(6)+3)

and I would like to transform it into something like this:

enter image description here

I have tried the tables package, but did not succeed. Does anyone have any clue about how to achieve this?

EDIT: The output may be either in LaTeX or Text form.


Solution

  • You can't merge data.frame cells in R like you would in excel, but you can combine the Country and period columns. Here's a solution with dplyr + tidyr:

    library(dplyr)
    library(tidyr)
    
    df %>%
      gather(var, value, Month:Year) %>%
      unite("var", Country, var) %>%
      spread(var, value)
    

    Result:

           Variable   CA_Month CA_Quarter  CA_Year   US_Month US_Quarter  US_Year
    1     Inflation  0.2760235   1.758310 4.233976 -0.4321298  3.6232025 5.149919
    2 Interest rate -0.5208693   1.227022 3.412022  1.2283928  3.6858872 3.495870
    3  Unemployment -1.0489755   1.531800 3.634362  1.6898725  0.9299318 1.665646
    

    To actually create a table (for reporting) that has the Country merged, here is how to do it with a one-liner using the tables package:

    library(tables)
    
    tabular(Variable ~  Heading()*Country*Heading()*identity*(Month + Quarter + Year), data=df)
    

    Result:

                   CA                    US                   
     Variable      Month   Quarter Year  Month   Quarter Year 
     Inflation      0.5269 2.152   3.854 -0.9456 3.764   1.432
     Interest rate  1.3974 1.820   3.340  0.4520 1.734   3.962
     Unemployment  -0.2303 3.377   3.419 -0.6652 2.486   2.739
    

    tabular uses its unique expression to generate the table formats:

    • ~ separates the expression for rows from columns. I am displaying the rows of Variable.

    • * means that you are nesting one column into another. In this case, I am nesting the Month:Year columns in Country.

    • identity specifies that you are displaying the actual values in each cell.

    • Heading replaces the heading of the next item with a string. In this case, I am replacing "Country" and "identity" with blank.

    To output as latex, you can wrap the entire expression with the latex function:

    latex(tabular(Variable ~  Heading()*Country*Heading()*identity*(Month + Quarter + Year), data=df))
    

    Result:

    \begin{tabular}{lcccccc}
    \hline
     & \multicolumn{6}{c}{Country} \\ 
     & \multicolumn{3}{c}{CA} & \multicolumn{3}{c}{US} \\ 
    Variable  & Month & Quarter & Year & Month & Quarter & \multicolumn{1}{c}{Year} \\ 
    \hline
    Inflation  & $\phantom{-}0.5269$ & $2.152$ & $3.854$ & $-0.9456$ & $3.764$ & $1.432$ \\
    Interest rate  & $\phantom{-}1.3974$ & $1.820$ & $3.340$ & $\phantom{-}0.4520$ & $1.734$ & $3.962$ \\
    Unemployment  & $-0.2303$ & $3.377$ & $3.419$ & $-0.6652$ & $2.486$ & $2.739$ \\
    \hline 
    \end{tabular}
    

    enter image description here