I am trying to run a geographically weighted random forest classification using the function SpatialML::rgf(). However, I am encountering the following error:
'Error in x[[jj]][iseq] <- vjj : replacement has length zero'
Here is an example:
# Install and load the required package
install.packages("SpatialML") # Install if you haven't already
library(SpatialML)
# Define an example dataset
set.seed(42)
n <- 100 # Number of observations
# Creating a dataframe with spatial coordinates
Coord <- data.frame(
x = runif(n, 0, 100), # X coordinate
y = runif(n, 0, 100) # Y coordinate
)
# Creating a dataframe with predictor variables and the categorical response variable
df <- data.frame(
category = sample(1:3, n, replace = TRUE), # Categorical numeric variable
var1 = rnorm(n, mean = 50, sd = 10), # Predictor variable 1 (e.g., temperature)
var2 = runif(n, 0, 1) # Predictor variable 2 (e.g., humidity)
)
# Fit a Geographically Weighted Random Forest (GWRF) model for categorical data
grf_model <- grf(
formula = category ~ var1 + var2, # The response variable is categorical
dframe = df,
kernel = "adaptive",
bw = 30,
coords = Coord, # Spatial coordinates
classification = TRUE # Specifying a categorical model
)
How can I fix this error? It is possible to run a GWRF for classification, right?
The details about the function are here.
Thank you in advance.
This seems to be a bug with the grf
function. Running debug
on the grf
function, we find that the error occurs at the following line:
LM_GofFit[m, 7] <- Lcl.Model$r.squared
The Lcl.Model
object is obtained by fitting a ranger::ranger()
model. Perusing the documentation, we find that the object contains r.squared
only when you are running a regression. So if you are running a
classification, as is the case here, the object will not have an r.squared
value and running Lcl.Model$r.squared
will generate an error.
You should file a bug report with the package creators, but I can offer a temporary fix. We can use trace(grf, edit = TRUE)
to temporarily edit a function. After running the trace command, a new window will open up with the function body. Find the following line:
LM_GofFit[m, 7] <- Lcl.Model$r.squared
And replace with:
LM_GofFit[m, 7] <- ifelse(is.null(Lcl.Model$r.squared),
NA, Lcl.Model$r.squared)
This will replace the null value for r.squared
with NA. The function runs successfully after making this change.
Output
Number of Observations: 100
Number of Independent Variables: 2
Kernel: Adaptive
Neightbours: 30
--------------- Global ML Model Summary ---------------
Ranger result
Call:
ranger(category ~ var1 + var2, data = df, num.trees = 500, mtry = 1, importance = "impurity", num.threads = NULL, classification = TRUE)
Type: Classification
Number of trees: 500
Sample size: 100
Number of independent variables: 2
Mtry: 1
Target node size: 1
Variable importance mode: impurity
Splitrule: gini
OOB prediction error: 60.00 %
Importance:
var1 var2
31.33969 34.33171
Mean Square Error (Not OOB): 0
R-squared (Not OOB) %: 100
AIC (Not OOB): -Inf
AICc (Not OOB): -Inf
--------------- Local Model Summary ---------------
Residuals OOB:
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.00 -1.00 0.00 -0.04 1.00 2.00
Residuals Predicted (Not OOB):
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 0 0 0 0 0
Local Variable Importance:
Min Max Mean StD
var1 6.704278 10.65329 8.765862 1.0442697
var2 6.973144 11.33329 9.010402 0.9682651
Mean squared error (OOB): 1.04
R-squared (OOB) %: -54.394
AIC (OOB): 9.922
AICc (OOB): 10.172
Mean squared error Predicted (Not OOB): 0
R-squared Predicted (Not OOB) %: 100
AIC Predicted (Not OOB): -Inf
AICc Predicted (Not OOB): -Inf
Calculation time (in seconds): 0.7686
I leave it up to you to verify if the output is sensible or not.