I want to calculate the holding period for a dataframe that consists of etherreum transactions. So far I managed to do so using this code:
holdingtime <- fullsample3 %>%
group_by(id) %>%
summarize(first_purch = min(day),
last_sale = max(day),
holdingtime = last_sale - first_purch)
however there is one issue: insome cases the sale was made before the purchase. For partial purchases and sales this leads to a problem because the max(day) then is the last purchase. I want to remove the cases where this happens becuase it is an issue with the dataset since it is a random sample. What I thought of as a solution is to use the condition that min(day) is fulfilling the requirement that ispurchase=1 and max(day) fulfills the requirement that ispurchase=0. Is it somehow possible to incorporate that condition into the above statement?
head, 20 looks like this:
date dollvalue id ispurchase purchcost ROI unrealgain isunrealgain Freq Price..Open. Real.Volume day first_purch last_sale holdingtime
1 2016-03-10 -13.1040546075 14227 1 644.34792 -82.25580715 loss 0 26 11.82282729 23068073.6 1 1 631 630
2 2016-03-10 -12.7179202325 90397 1 90.22330 -89.59669524 loss 0 6 11.82282729 23068073.6 1 1 573 572
3 2016-03-10 -91.84975893 795 1 104.50292 -27.01098230 loss 0 3 11.82282729 23068073.6 1 1 48 47
4 2016-03-10 -143.65435208484467 101708 1 498.31563 -92.47063554 loss 0 9 11.82282729 23068073.6 1 1 290 289
5 2016-03-10 -526.558346525 33668 1 620.39674 -14.06241903 loss 0 10 11.82282729 23068073.6 1 1 173 172
6 2016-03-10 -108.9339707025 46069 1 1163.70950 -81.04360573 loss 0 35 11.82282729 23068073.6 1 1 649 648
7 2016-03-10 -93.445250305 33148 1 272.07858 -38.18461972 loss 0 22 11.82282729 23068073.6 1 1 305 304
8 2016-03-10 -90.9366338825 45887 1 224.91639 -23.78622370 loss 0 17 11.82282729 23068073.6 1 1 214 213
9 2016-03-10 -38.3817161025 109509 1 38.54848 -67.13288211 loss 0 3 11.82282729 23068073.6 1 1 184 183
10 2016-03-10 -28.083862412500004 58311 1 782.23273 -79.55624103 loss 0 21 11.82282729 23068073.6 1 1 455 454
11 2016-03-10 -14.9815894575 69430 1 14.98159 -31.41139543 loss 0 2 11.82282729 23068073.6 1 1 67 66
12 2016-03-10 -14.003082945 73389 1 14.84417 -60.84059568 loss 0 3 11.82282729 23068073.6 1 1 49 48
13 2016-03-10 -12.164932215 1368 1 84.90448 -45.54548038 loss 0 13 11.82282729 23068073.6 1 1 167 166
14 2016-03-10 -12.88071801 71736 1 12.88072 5.52392315 gain 1 2 11.82282729 23068073.6 1 1 112 111
15 2016-03-10 -39.442592465000004 75958 1 864.92423 -91.06319184 loss 0 9 11.82282729 23068073.6 1 1 740 739
16 2016-03-10 -3.58626732 16621 1 824.73640 -98.01801979 loss 0 12 11.82282729 23068073.6 1 1 614 613
17 2016-03-10 -12.923672367500002 40079 1 35.54478 -0.09041456 loss 0 6 11.82282729 23068073.6 1 1 164 163
18 2016-03-10 -20.06628857 105766 1 122.41407 -49.95405644 loss 0 12 11.82282729 23068073.6 1 1 101 100
19 2016-03-10 -21.825599855 4790 1 90.66270 -52.58963225 loss 0 9 11.82282729 23068073.6 1 1 165 164
20 2016-03-10 -13.0973420675 23210 1 81.83160 -25.17269318 loss 0 11 11.82282729 23068073.6 1 1 144 143
Since there are not multiple transactions per id, here a picture of what I mean:
The green marked id is an example of where all the transactions occurred in a chronological manner, whereas the red marked id is an example of a case in which the sale occured before the purchase and therefore should be removed.
I hope this is not all too confusing and I am thankful for any kind of help.
Using your code, you can simply add the condition to your day
variable, i.e.:
holdingtime <- fullsample3 %>%
group_by(id) %>%
summarize(first_purch = min(day[ispurchase == 1]),
last_sale = max(day[ispurchase == 0]),
holdingtime = last_sale - first_purch)