I feel like this is a common issue, yet I can't seem to find an answer. I am working with a sampled dataset and attempting to calculate calibration weights against known population totals using the Sampling package. The calib function acts on a sample column or matrix (Xs), initial weights (d), and a population total (total) to calculate g-weights. Below is the first column in my dataset which yields the error:
Error in svd(X) : infinite or missing values in 'x'
Other columns also yield this error, but for simplicity sake, I've only included the first. Reproducible example below:
library("sampling")
# Sample
Xs = c(3793, 4505, 2272, 1126, 1839, 2060, 9077, 3174, 4013, 1673, 1299, 3981, 1770, 1059, 899, 2475, 1731, 2135, 843, 1880, 7887, 6402, 3022, 3345, 3954, 4489, 6222, 694)
Xs <- as.matrix(sapply(Xs, as.numeric))
# Initial Weights
d = rep(1, nrow(Xs))
# Population total
total = c(1616772)
g = calib(Xs, d, total, method="logit")
I then searched for infinite values and NaN, based on similar questions in S.O., using the following code and found nothing.
length(Xs)
sum(is.finite(Xs))
sum(is.nan(Xs))
length(d)
sum(is.finite(d))
sum(is.nan(d))
length(total)
sum(is.finite(total))
sum(is.nan(total))
[1] 28
[1] 28
[1] 0
[1] 28
[1] 28
[1] 0
[1] 1
[1] 1
[1] 0
My apologies if this is elementary. Thank you.
I am not familiar with the package but I looked and the error comes from the function base::svd(). I suspected it might be an initiation problem so I tried the following and it worked:
library(sampling)
Xs = c(3793, 4505, 2272, 1126, 1839, 2060, 9077, 3174, 4013, 1673, 1299, 3981, 1770, 1059, 899, 2475, 1731, 2135, 843, 1880, 7887, 6402, 3022, 3345, 3954, 4489, 6222, 694)
Xs <- as.matrix(sapply(Xs, as.numeric))
# Initial Weights
d = rep(1, nrow(Xs))
# Population total
total = c(1616772)
g = calib(Xs, d, total, method="logit")
Be careful though! trying different initations for d gives me very different result. i.e compare to using: d = rep(2, nrow(Xs))
.