My dataset
looks like following. I am trying to answer below question.
Based on Drawing paper data ONLY, does the stores sells more units (units.sold column) of one paper subtype(paper.type) than others ?
To answer above question I used tapply
function where I was able to filter data for both papers. Now I am not sure how to proceed further to get only Drawing paper data. Any help is appreciated!
My code
date year rep store paper paper.type unit.price units.sold
9991 12/30/2015 2015 Ran Dublin watercolor sheet 0.77 5 3.85
9992 12/30/2015 2015 Ran Dublin drawing pads 10.26 1 10.26
9993 12/30/2015 2015 Arijit Syracuse watercolor pad 12.15 2 24.30
9994 12/30/2015 2015 Thomas Davenport drawing roll 20.99 1 20.99
9995 12/31/2015 2015 Ruisi Dublin watercolor sheet 0.77 7 5.39
9996 12/31/2015 2015 Mohit Davenport drawing roll 20.99 1 20.99
9997 12/31/2015 2015 Aman Portland drawing pads 10.26 1 10.26
9998 12/31/2015 2015 Barakat Portland watercolor block 19.34 1 19.34
9999 12/31/2015 2015 Yunzhu Syracuse drawing journal 24.94 1 24.94
10000 12/31/2015 2015 Aman Portland watercolor block 19.34 1 19.34
Note: I am new to R.Please provide explanation along with your code.
You could start by taking aggregate
of unit.sold
column based on store
and paper.type
aggregate(units.sold~store+paper.type, df[df$paper == "drawing", ], sum)
# store paper.type units.sold
#1 Syracuse journal 1
#2 Dublin pads 1
#3 Portland pads 1
#4 Davenport roll 2
Here we filter the data for only "drawing" type of paper
. We can compare the number of units.sold
for each store
and paper.type
based on this output.