Search code examples
rdatatable

New data frame with unique values and counts


I'd like to create a new data table from my old one that includes a count of all the "article_id" that occur for each date (i.e. there are three article_id's listed for the date 2001-10-01, so I'd like one column with the date and one column that has the article count, "3").

Here is the output of the data table:

            date       article_id  N
   1: 2001-09-01 FAS_200109_11104  3
   2: 2001-10-01 FAS_200110_11126  6
   3: 2001-10-01 FAS_200110_11157 21
   4: 2001-10-01 FAS_200110_11160  5
   5: 2001-11-01 FAS_200111_11220 26
  ---                               
7359: 2019-08-01  FAZ_201908_2958  7
7360: 2019-09-01  FAZ_201909_3316  8
7361: 2019-09-01  FAZ_201909_3515 13
7362: 2000-12-01 FAZ_200012_92981  3
7363: 2001-08-01 FAZ_200108_86041 14 

So I'll have to move over the unique date values to a new data frame (so that each date is only shown once), as well as a count of article_id's shown for each date.

I've been trying to figure this out but haven't found exactly the right answer regarding how to count the occurrence of a character vector (the article_id) by group (date). I think this is something pretty simple in R, but I'm new to the program and don't have much support so I would very much appreciate your suggestions - thank you so much!


Solution

  • The expected output is not clear. Some assumptions of expected output

    1. Sum of 'N' by 'date'
    library(data.table)
    dt[, .(N = sum(N, na.rm = TRUE)), by = date]
    
    1. Count of unique 'article_id' for each date
    dt1[, .(N = uniqueN(article_id)), by = date]
    
    1. Get the first count by 'date'
    dt1[, .(N = first(N)), by = date]