Search code examples
rgeospatialrandom-forestweightedgeographic-distance

R: Error in x[[jj]][iseq] <- vjj : replacement has lenght zero (Library SpatialML::rgf)


I am trying to run a geographically weighted random forest classification using the function SpatialML::rgf(). However, I am encountering the following error:

'Error in x[[jj]][iseq] <- vjj : replacement has length zero'

Here is an example:

# Install and load the required package
install.packages("SpatialML")  # Install if you haven't already
library(SpatialML)

# Define an example dataset
set.seed(42)
n <- 100  # Number of observations

# Creating a dataframe with spatial coordinates
Coord <- data.frame(
  x = runif(n, 0, 100),  # X coordinate
  y = runif(n, 0, 100)   # Y coordinate
)

# Creating a dataframe with predictor variables and the categorical response variable
df <- data.frame(
  category = sample(1:3, n, replace = TRUE),  # Categorical numeric variable
  var1 = rnorm(n, mean = 50, sd = 10),  # Predictor variable 1 (e.g., temperature)
  var2 = runif(n, 0, 1)  # Predictor variable 2 (e.g., humidity)
)

# Fit a Geographically Weighted Random Forest (GWRF) model for categorical data
grf_model <- grf(
  formula = category ~ var1 + var2,  # The response variable is categorical
  dframe = df,
  kernel = "adaptive",
  bw = 30,
  coords = Coord,  # Spatial coordinates
  classification = TRUE  # Specifying a categorical model
)

How can I fix this error? It is possible to run a GWRF for classification, right?

The details about the function are here.

Thank you in advance.


Solution

  • This seems to be a bug with the grf function. Running debug on the grf function, we find that the error occurs at the following line:

    LM_GofFit[m, 7] <- Lcl.Model$r.squared
    

    The Lcl.Model object is obtained by fitting a ranger::ranger() model. Perusing the documentation, we find that the object contains r.squared only when you are running a regression. So if you are running a classification, as is the case here, the object will not have an r.squared value and running Lcl.Model$r.squared will generate an error.

    You should file a bug report with the package creators, but I can offer a temporary fix. We can use trace(grf, edit = TRUE) to temporarily edit a function. After running the trace command, a new window will open up with the function body. Find the following line:

    LM_GofFit[m, 7] <- Lcl.Model$r.squared
    

    And replace with:

    LM_GofFit[m, 7] <- ifelse(is.null(Lcl.Model$r.squared), 
                NA, Lcl.Model$r.squared)
    

    This will replace the null value for r.squared with NA. The function runs successfully after making this change.

    Output

    Number of Observations: 100
    Number of Independent Variables: 2
    Kernel: Adaptive
    Neightbours: 30
    
    --------------- Global ML Model Summary ---------------
    
    Ranger result
    
    Call:
     ranger(category ~ var1 + var2, data = df, num.trees = 500, mtry = 1,      importance = "impurity", num.threads = NULL, classification = TRUE) 
    
    Type:                             Classification 
    Number of trees:                  500 
    Sample size:                      100 
    Number of independent variables:  2 
    Mtry:                             1 
    Target node size:                 1 
    Variable importance mode:         impurity 
    Splitrule:                        gini 
    OOB prediction error:             60.00 % 
    
    Importance:
    
        var1     var2 
    31.33969 34.33171 
    
    Mean Square Error (Not OOB): 0
    R-squared (Not OOB) %: 100
    AIC (Not OOB): -Inf
    AICc (Not OOB): -Inf
    
    --------------- Local Model Summary ---------------
    
    
    Residuals OOB:
    
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      -2.00   -1.00    0.00   -0.04    1.00    2.00 
    
    Residuals Predicted (Not OOB):
    
       Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
          0       0       0       0       0       0 
    
    Local Variable Importance:
    
              Min      Max     Mean       StD
    var1 6.704278 10.65329 8.765862 1.0442697
    var2 6.973144 11.33329 9.010402 0.9682651
    
    Mean squared error (OOB): 1.04
    R-squared (OOB) %: -54.394
    AIC (OOB): 9.922
    AICc (OOB): 10.172
    Mean squared error Predicted (Not OOB): 0
    R-squared Predicted (Not OOB) %: 100
    AIC Predicted (Not OOB): -Inf
    AICc Predicted (Not OOB): -Inf
    
    Calculation time (in seconds): 0.7686
    

    I leave it up to you to verify if the output is sensible or not.