Search code examples
rdataframedatatable

How do I create a data table in code in R


I have a data table as a CSV file that I use to create metrics for a dashboard. The data table includes Metric IDs and associates these with field names. This table--this definition of metrics--is largely static, and I'd like to include it within R code rather than, for example, importing a CSV file containing these headings.

The table looks something like this:

Metric_ID Metric_Name Numerator Denominator
AB0001 Number_of_Customers No_of_Customers
AB0002 Percent_New_Customers No_of_New_Customers No_of_Customers

This has about 40 rows of data, and I'd like to set this table up in code so that it is created at the time the R query is run. I'll then use it to associate metric IDs with measures I retrive through SQL queries. Sometimes this table may change -- for example, new metrics might be added or existing metrics modified. This would need some modificatoin in the code to incorporate these metrics.

The closet way I could find was to create a data table, along the lines described in the query below.

dt<-data.table(x=c(1,2,3),y=c(2,3,4),z=c(3,4,5))
dt

   x y z
1: 1 2 3
2: 2 3 4
3: 3 4 5

cbind with data table and data frame

This works for a table with a few rows or columns, but will be unwieldy for tables with 40+ rows. For example, if I wanted to modify a metric 20 rows down, I'd have to go 20 rows down in each column, and then test the table to ensure I switched the metric at the right place in each column -- especially where some metrics have empty cells. for example, I may correct the metric ID in row 20, but accidentally put the definition (a separate column) in row 19.

Is there a more straightforward way of, in essence, creating a table in code?

(I appreciate the most straightforward way would be to keep a CSV file accessible and use read_csv to import it into R. However, this doesn't work so well if colleagues are running this query on their machine and have a different file path to the CSV -- it also raises the risk of them running the query with an out-of-date metrics table, as they may not have the latest version in their files).

Thanks in advance for any guidance you might have!

Tony


Solution

  • Here are two options (examples taken from respective help pages):

    data.table::fread()

    fread("A,B
    1,2
    3,4
    ")
    #>        A     B
    #>    <int> <int>
    #> 1:     1     2
    #> 2:     3     4
    

    https://rdatatable.gitlab.io/data.table/reference/fread.html

    tibble::tribble()

    tribble(
      ~colA, ~colB,
      "a",   1,
      "b",   2,
      "c",   3
    )
    #> # A tibble: 3 × 2
    #>   colA   colB
    #>   <chr> <dbl>
    #> 1 a         1
    #> 2 b         2
    #> 3 c         3
    

    https://tibble.tidyverse.org/reference/tribble.html

    Other options:

    • If you already have the data.frame from somewhere, you can also use dput() to get a structure() code you can paste into the files you are distributing.
    • use the reprex package https://reprex.tidyverse.org/