I have two data frames in R
. The first data frame is a cumulative frequency distribution (cumFreqDist
) with associated periods. The first rows of the data frame look like this:
Time cumfreq
0 0.0000000
4 0.9009009
6 1.8018018
8 7.5075075
12 23.4234234
16 39.6396396
18 53.4534535
20 58.2582583
24 75.3753754
100 100.0000000
The second data frame is 10000 draws from a runif
distribution, using the code:
testData <- (runif(10000))*100
For each row in testData
, I want to locate the corresponding cumfreq
in cumFreqDist
and add the corresponding Time
value into a new column in testData
. Because testData
is a test data frame standing in for a real data frame, I do not wish to sort testData
.
Because I am dealing with cumulative frequencies, if the testData
value is 23.30...
the Time
value that should be returned is 8
. That is, I need to locate the nearest cumfreq
value that does not exceed the testData
value, and return only that one value.
The data.table
package has been mentioned for other similar questions, but my limited understanding is that this package requires a key to be identified in both data frames (after conversion to data tables) and I can't assume that the testData
values meet the requirements for being assigned as a key - and it appears that assigning a key will sort the data. This will cause me issues when I set a seed later in further work I am doing.
findInterval()
is perfect for this:
set.seed(1);
cumFreqDist <- data.frame(Time=c(0,4,6,8,12,16,18,20,24,100), cumfreq=c(0.0000000,0.9009009,1.8018018,7.5075075,23.4234234,39.6396396,53.4534535,58.2582583,75.3753754,100.0000000) );
testData <- data.frame(x=runif(10000)*100);
testData$Time <- cumFreqDist$Time[findInterval(testData$x,cumFreqDist$cumfreq)];
head(testData,20);
## x Time
## 1 26.550866 12
## 2 37.212390 12
## 3 57.285336 18
## 4 90.820779 24
## 5 20.168193 8
## 6 89.838968 24
## 7 94.467527 24
## 8 66.079779 20
## 9 62.911404 20
## 10 6.178627 6
## 11 20.597457 8
## 12 17.655675 8
## 13 68.702285 20
## 14 38.410372 12
## 15 76.984142 24
## 16 49.769924 16
## 17 71.761851 20
## 18 99.190609 24
## 19 38.003518 12
## 20 77.744522 24