DF
times a b s ex
1 0 59 140 1e-4 1
2 20 59 140 1e-4 0
3 40 59 140 1e-4 0
4 60 59 140 1e-4 2
5 120 59 140 1e-4 20
6 180 59 140 1e-4 30
7 240 59 140 1e-4 31
8 360 59 140 1e-4 37
9 0 60 140 1e-4 0
10 20 60 140 1e-4 0
11 40 60 140 1e-4 0
12 60 60 140 1e-4 0
13 120 60 140 1e-4 3300
14 180 60 140 1e-4 6600
15 240 60 140 1e-4 7700
16 360 60 140 1e-4 7700
# dput(DF)
structure(list(times = c(0, 20, 40, 60, 120, 180, 240, 360, 0,
20, 40, 60, 120, 180, 240, 360), a = c(59, 59, 59, 59, 59, 59,
59, 59, 60, 60, 60, 60, 60, 60, 60, 60), b = c(140, 140, 140,
140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 140
), s = c(1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04,
1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04, 1e-04), ex = c(1,
0, 0, 2, 20, 30, 31, 37, 0, 0, 0, 0, 3300, 6600, 7700, 7700)), .Names = c("times",
"a", "b", "s", "ex"), row.names = c(NA, 16L), class = "data.frame")
DF2
prime times mean
g1 0 1.0000000
g1 20 0.7202642
g1 40 0.8000305
g1 60 1.7430986
g1 120 16.5172242
g1 180 25.6521268
g1 240 33.9140056
g1 360 34.5735984
#dput(DF2)
structure(list(times = c(0, 20, 40, 60, 120, 180, 240, 360),
mean = c(1, 0.7202642, 0.8000305, 1.7430986, 16.5172242,
25.6521268, 33.9140056, 34.5735984)), .Names = c("times",
"mean"), row.names = c(NA, -8L), class = "data.frame")
DF is an example of a larger data frame which actually has hundreds of combinations of the 'a','b', and 's' values which result in different 'ex' values. What I want to do is find the combination of 'a','b', and 's' whose 'ex' values (DF) best fit the 'mean' values (DF2) at equivalent 'times'. This fitting will be a comparison of 8 values at a time (ie, times == c(0,20,40,60,120,180,240,360).
In this example, I would want 59, 140, and 1e-4 for the 'a', 'b', and 's' values, because those 'ex' values (DF) best fit the 'mean' values (DF2).
I would like 'a','b', and 's' values for those values which 'ex' (DF) best fits 'mean' (DF2)
Since I want one possible combination of the 'a','b', and 's' values a linear least squares fit model would be best. I would be comparing 8 values at a time -- where 'times' == 0 - 360. I don't want 'a', 'b', and 's' values which work best for each individual time point. I want 'a', 'b', and 's' values where all 8 'ex' (DF) best fit all 8 'mean' values (DF2) This is where I need help.
I have never used linear least squares fitting, but I assume what I'm trying to do is possible.
lm(DF2$mean ~ DF$ex,....) # i'm not sure if I should combine the two
# data frames first then use that as my data argument, then
# where I would include 'times' as the point of comparison,
# if that would be used in subset?
It sounds like a linear model is not what you need here. A linear model will in the best case give you a linear combination of different a/b/s
configurations, not the single best matching combination. Thus the term linear in that name.
I take it that you have some guarantee that the times
values of DF
will match the times
values of DF2
. One first step might be turning DF
into a dataframe where there is only one row for every a/b/s
combination, and the different ex
values are stored as the columns of a matrix. Then for each row, you'd want to subtract the ex
values from the DF2$mean
values, square those differences, and add them together, to compute a single square error for the row. Then simply select the row with minimal value.
The above solution is pretty vague. There are a million ways to actually implement this, and instead of copying my solution, you might be better off writing them yourself, in the way you best understand them. Some hints how to achieve the individual steps:
matrix(DF$ex, byrow=TRUE, ncol=8)
can compute the matrixDF[seq(from=1, to=nrow(DF), by=8),2:4]
will provide the a/b/s
values corresponding to each of the matrix rowscbind
can be used to combine these twomatrix(DF2$mean, byrow=TRUE, ncol=8, nrow=nrow(DF)/8)
will turn those means into a matrix which you can simply subtract**2
will square all components of a matrixrowSums
will add the elements of a row of a matrixwhich.min
will return the index of the minimal valuePutting it all together in one possible way, putting everything in a single expression without using intermediate variables (not the most readable solution):
DF[seq(from=1, to=nrow(DF), by=8),2:4][which.min(
rowSums((matrix(DF$ex, byrow=TRUE, ncol=8) -
matrix(DF2$mean, byrow=TRUE, ncol=8, nrow=nrow(DF)/8)
)**2
)
),]
If you don't store the matrix as part of a data frame, you might want to transpose it to avoid those byrow=TRUE
arguments and leverage the fact that a vector will be repeated for every column in a matrix-vector subtraction:
DF[seq(from=1, to=nrow(DF), by=8),2:4][which.min(
colSums((matrix(DF$ex, nrow=8) - DF2$mean)**2)),]