Search code examples
rmachine-learningbayesianhierarchical-bayesian

Bayesian Modelling in R


I am trying to implement a bayesian model in R using bas package with setting up these values for my Model:

databas <- bas.lm(at_areabuilding ~ ., data = dataCOMMA, method = "MCMC", prior = "ZS-null", modelprior = uniform())

I am trying to predict area of a given state with the help of certain area present for that particular state; but for different zip codes. My Model basically finds the various zip codes present in the data for a given state(using a state index for this) and then gives the output.

Now, Whenever I try to predict area of a state, I give this input:

> UT <- data.frame(zip = 84321, loc_st_prov_cd = "UT" ,state_idx = 7)
> predict_1 <- predict(databas,UT, estimator="BMA", interval = "predict", se.fit=TRUE)
> data.frame('state' = 'UT','estimated area' = predict_1$Ybma)

Now, I get the output for this state. Suppose I have a list of states with given zip codes and I want to run my Model (databas) on that list and get the predictions, I cannot do it by using the above approach as it will take time. Is there any other way to do the same? I did the same by the help of one gentleman and here is my code:

 pred <- sapply(1:nrow(first), function(row) { predict(basdata,first[row, ],estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma })

basdata: My Model first: my new dataset for which I am predicting area. Now, The issue that i am facing is that the code is taking a long time to predict the values. It iterates over every row and calculates the area. There are 150000 rows in my dataset and I would request if anyone can help me optimizing the performance of this code.


Solution

  • Something like this will iterate over each row of your data frame of states, zips and indices (let's call it states_and_zips) and return a list of predictions. Each element of this list (which I've called pred) goes with the corresponding row of state_and_zips:

    pred = lapply(1:nrow(states_and_zips), function(row) {
      predict(databas, ~ states_and_zips[row, ], 
              estimator="BMA", interval = "predict", se.fit=TRUE)$Ybma
    })
    

    If Ybma is a single value, then use sapply instead of lapply and it will return a vector of predictions, one for each row of state_and_zips that you can just add as a new column to states_and_zips.