I have a grouped dataset. I have my data grouped by GaugeID. I have an nls function that I want to loop over each group and provide an output value.
# sample of data (yearly), first column is gauge (grouping variable), year, then two formula inputs PETvP and ETvP
# A tibble: 10 x 4
<chr> <dbl> <dbl> <dbl>
1 06892000 1981 0.854 0.754
2 06892000 1982 0.798 0.708
3 06892000 1983 1.12 0.856
4 06892000 1984 0.905 0.720
5 06892000 1985 0.721 0.618
6 06892000 1986 0.717 0.625
7 06892000 1987 0.930 0.783
8 06892000 1988 1.57 0.945
9 06892000 1989 1.15 0.739
10 06892000 1990 0.933 0.805
11 08171300 1981 0.854 0.754
12 08171300 1982 0.798 0.708
13 08171300 1983 1.12 0.856
14 08171300 1984 0.905 0.720
15 08171300 1985 0.721 0.618
16 08171300 1986 0.717 0.625
17 08171300 1987 0.930 0.783
18 08171300 1988 1.57 0.945
19 08171300 1989 1.15 0.739
20 08171300 1990 0.933 0.805
# attempted for loop
for (i in unique(yearly$GaugeID)) {
myValue = nls(ETvP[i] ~ I(1 + PETvP[i] - (1 + PETvP[i]^(w))^(1/w)), data = yearly,
start = list(w = 2), trace = TRUE)
I get the following error
Error in model.frame.default(formula = ~ETvP + i + PETvP, data = yearly) :
variable lengths differ (found for 'i')
I haven't found much information regarding looping with the nls function. Essentially, I am producing curves and need the value of the curve (w) to output for each gauge. It works if I assign the formula to just one gauge (if I subset the data, i.e for the first gauge), but not when I try to use it on the entire data frame with grouped data. For example, this works
# gaugeA
# A tibble: 10 x 4
<chr> <dbl> <dbl> <dbl>
1 06892000 1981 0.854 0.754
2 06892000 1982 0.798 0.708
3 06892000 1983 1.12 0.856
4 06892000 1984 0.905 0.720
5 06892000 1985 0.721 0.618
6 06892000 1986 0.717 0.625
7 06892000 1987 0.930 0.783
8 06892000 1988 1.57 0.945
9 06892000 1989 1.15 0.739
10 06892000 1990 0.933 0.805
test = nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)), data = gaugeA,
start = list(w = 2), trace = TRUE)
1.574756 (4.26e+00): par = (2)
0.2649549 (1.46e+00): par = (2.875457)
0.09466832 (3.32e-01): par = (3.59986)
0.08543699 (2.53e-02): par = (3.881397)
0.08538308 (9.49e-05): par = (3.907099)
0.08538308 (1.13e-06): par = (3.907001)
> test
Nonlinear regression model
model: ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w))
data: gaugeA
residual sum-of-squares: 0.08538
Number of iterations to convergence: 5
Achieved convergence tolerance: 1.128e-06
Any ideas on how I can get the subset results for my entire grouped dataframe? It has over 600 different gauges in it. Thank you in advance.
Any of the following will work:
Using summarise
df %>%
group_by(GaugeID) %>%
summarise(result = list(nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)),
data = cur_data(),
start = list(w = 2)))) %>%
Nonlinear regression model
model: ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w))
data: cur_data()
residual sum-of-squares: 0.01694
Number of iterations to convergence: 5
Achieved convergence tolerance: 7.11e-08
Nonlinear regression model
model: ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w))
data: cur_data()
residual sum-of-squares: 0.1532
Number of iterations to convergence: 5
Achieved convergence tolerance: 2.685e-07
Using map
df %>%
group_split(GaugeID) %>%
map(~nls(ETvP ~ I(1 + PETvP - (1 + PETvP^(w))^(1/w)),
data = .x,
start = list(w = 2)))