Search code examples
rdynamicuniquesubset

Dynamically create multiple subsets based on unique column values


I have data with a timestamp column as shown here

   v1 v2      v3                       v4  v5
   1  apple   2/20/2015  12:09:19 AM  100  98 
   2  pear    2/19/2015  12:09:16 AM   98  97
   3  apple   2/19/2015  12:09:17 AM   NA  80
   4  apple   2/17/2015  12:09:11 AM   78  75
   5  pear    2/20/2015  12:09:12 AM   50  62
   6  cherry  2/21/2015  12:09:13 AM   75  75
   7  apple   2/20/2015  12:09:14 AM   75  75

I want to determine if an entry occurred for each fruit type in each day. Both file-size and number of fruit types are large.

First for each fruit type I will want to dynamically return the subset e.g. for apple

   v1 v2      v3                       v4  v5
   1  apple   2/20/2015  12:09:15 AM  100  98 
   3  apple   2/19/2015  12:09:15 AM   NA  80
   4  apple   2/17/2015  12:09:15 AM   78  75
   7  apple   2/20/2015  12:09:14 AM   75  75

Then for each fruit type, I am looking to count if any entry occurred in a day (e.g. yes or no or 0 or 1 as below) e.g. for apple

   v2      v3          sign
   apple   2/17/2015   1
   apple   2/18/2015   0
   apple   2/19/2015   1
   apple   2/20/2015   1 
   apple   2/20/2015   1

I am new to r and any guidance is helpful. I am currently using unique(df$v2) but getting stuck on hash or assign naming.


Solution

  • I ended up using xtabs as below.

    xtabs(~v3+v2,data=df)
    

    This provided the count per v2 item, I then substituted values greater than 0 to 1.