Search code examples
rdate-difference

Calculate difference between highest and lowest value by group in R


I want to calculate the holding period for a dataframe that consists of etherreum transactions. So far I managed to do so using this code:

 holdingtime <- fullsample3 %>% 
  group_by(id) %>%
  summarize(first_purch = min(day), 
            last_sale = max(day), 
            holdingtime = last_sale - first_purch)

however there is one issue: insome cases the sale was made before the purchase. For partial purchases and sales this leads to a problem because the max(day) then is the last purchase. I want to remove the cases where this happens becuase it is an issue with the dataset since it is a random sample. What I thought of as a solution is to use the condition that min(day) is fulfilling the requirement that ispurchase=1 and max(day) fulfills the requirement that ispurchase=0. Is it somehow possible to incorporate that condition into the above statement?

head, 20 looks like this:

      date           dollvalue     id ispurchase  purchcost          ROI unrealgain isunrealgain Freq Price..Open. Real.Volume day first_purch last_sale holdingtime
1  2016-03-10      -13.1040546075  14227          1  644.34792 -82.25580715       loss            0   26  11.82282729  23068073.6   1           1       631         630
2  2016-03-10      -12.7179202325  90397          1   90.22330 -89.59669524       loss            0    6  11.82282729  23068073.6   1           1       573         572
3  2016-03-10        -91.84975893    795          1  104.50292 -27.01098230       loss            0    3  11.82282729  23068073.6   1           1        48          47
4  2016-03-10 -143.65435208484467 101708          1  498.31563 -92.47063554       loss            0    9  11.82282729  23068073.6   1           1       290         289
5  2016-03-10      -526.558346525  33668          1  620.39674 -14.06241903       loss            0   10  11.82282729  23068073.6   1           1       173         172
6  2016-03-10     -108.9339707025  46069          1 1163.70950 -81.04360573       loss            0   35  11.82282729  23068073.6   1           1       649         648
7  2016-03-10       -93.445250305  33148          1  272.07858 -38.18461972       loss            0   22  11.82282729  23068073.6   1           1       305         304
8  2016-03-10      -90.9366338825  45887          1  224.91639 -23.78622370       loss            0   17  11.82282729  23068073.6   1           1       214         213
9  2016-03-10      -38.3817161025 109509          1   38.54848 -67.13288211       loss            0    3  11.82282729  23068073.6   1           1       184         183
10 2016-03-10 -28.083862412500004  58311          1  782.23273 -79.55624103       loss            0   21  11.82282729  23068073.6   1           1       455         454
11 2016-03-10      -14.9815894575  69430          1   14.98159 -31.41139543       loss            0    2  11.82282729  23068073.6   1           1        67          66
12 2016-03-10       -14.003082945  73389          1   14.84417 -60.84059568       loss            0    3  11.82282729  23068073.6   1           1        49          48
13 2016-03-10       -12.164932215   1368          1   84.90448 -45.54548038       loss            0   13  11.82282729  23068073.6   1           1       167         166
14 2016-03-10        -12.88071801  71736          1   12.88072   5.52392315       gain            1    2  11.82282729  23068073.6   1           1       112         111
15 2016-03-10 -39.442592465000004  75958          1  864.92423 -91.06319184       loss            0    9  11.82282729  23068073.6   1           1       740         739
16 2016-03-10         -3.58626732  16621          1  824.73640 -98.01801979       loss            0   12  11.82282729  23068073.6   1           1       614         613
17 2016-03-10 -12.923672367500002  40079          1   35.54478  -0.09041456       loss            0    6  11.82282729  23068073.6   1           1       164         163
18 2016-03-10        -20.06628857 105766          1  122.41407 -49.95405644       loss            0   12  11.82282729  23068073.6   1           1       101         100
19 2016-03-10       -21.825599855   4790          1   90.66270 -52.58963225       loss            0    9  11.82282729  23068073.6   1           1       165         164
20 2016-03-10      -13.0973420675  23210          1   81.83160 -25.17269318       loss            0   11  11.82282729  23068073.6   1           1       144         143

Since there are not multiple transactions per id, here a picture of what I mean: enter image description here

The green marked id is an example of where all the transactions occurred in a chronological manner, whereas the red marked id is an example of a case in which the sale occured before the purchase and therefore should be removed.

I hope this is not all too confusing and I am thankful for any kind of help.


Solution

  • Using your code, you can simply add the condition to your day variable, i.e.:

    holdingtime <- fullsample3 %>% 
      group_by(id) %>%
      summarize(first_purch = min(day[ispurchase == 1]), 
                last_sale = max(day[ispurchase == 0]), 
                holdingtime = last_sale - first_purch)