I have data with a timestamp column as shown here
v1 v2 v3 v4 v5
1 apple 2/20/2015 12:09:19 AM 100 98
2 pear 2/19/2015 12:09:16 AM 98 97
3 apple 2/19/2015 12:09:17 AM NA 80
4 apple 2/17/2015 12:09:11 AM 78 75
5 pear 2/20/2015 12:09:12 AM 50 62
6 cherry 2/21/2015 12:09:13 AM 75 75
7 apple 2/20/2015 12:09:14 AM 75 75
I want to determine if an entry occurred for each fruit type in each day. Both file-size and number of fruit types are large.
First for each fruit type I will want to dynamically return the subset e.g. for apple
v1 v2 v3 v4 v5
1 apple 2/20/2015 12:09:15 AM 100 98
3 apple 2/19/2015 12:09:15 AM NA 80
4 apple 2/17/2015 12:09:15 AM 78 75
7 apple 2/20/2015 12:09:14 AM 75 75
Then for each fruit type, I am looking to count if any entry occurred in a day (e.g. yes or no or 0 or 1 as below) e.g. for apple
v2 v3 sign
apple 2/17/2015 1
apple 2/18/2015 0
apple 2/19/2015 1
apple 2/20/2015 1
apple 2/20/2015 1
I am new to r and any guidance is helpful. I am currently using unique(df$v2) but getting stuck on hash or assign naming.
I ended up using xtabs as below.
xtabs(~v3+v2,data=df)
This provided the count per v2 item, I then substituted values greater than 0 to 1.