Search code examples
rsamplingcalibration

R: Sampling: Calib function: Error in svd(X) : infinite or missing values in 'x'


I feel like this is a common issue, yet I can't seem to find an answer. I am working with a sampled dataset and attempting to calculate calibration weights against known population totals using the Sampling package. The calib function acts on a sample column or matrix (Xs), initial weights (d), and a population total (total) to calculate g-weights. Below is the first column in my dataset which yields the error:

 Error in svd(X) : infinite or missing values in 'x'

Other columns also yield this error, but for simplicity sake, I've only included the first. Reproducible example below:

library("sampling")
# Sample
Xs = c(3793, 4505, 2272, 1126, 1839, 2060, 9077, 3174, 4013, 1673, 1299, 3981, 1770, 1059,  899, 2475, 1731, 2135,  843, 1880, 7887, 6402, 3022, 3345, 3954, 4489, 6222, 694)
Xs <- as.matrix(sapply(Xs, as.numeric))

# Initial Weights
d = rep(1, nrow(Xs))

# Population total
total = c(1616772)

g = calib(Xs, d, total, method="logit")

I then searched for infinite values and NaN, based on similar questions in S.O., using the following code and found nothing.

length(Xs)
sum(is.finite(Xs))
sum(is.nan(Xs))
length(d)
sum(is.finite(d))
sum(is.nan(d))
length(total)
sum(is.finite(total))
sum(is.nan(total))

[1] 28
[1] 28
[1] 0

[1] 28
[1] 28
[1] 0

[1] 1
[1] 1
[1] 0

My apologies if this is elementary. Thank you.


Solution

  • I am not familiar with the package but I looked and the error comes from the function base::svd(). I suspected it might be an initiation problem so I tried the following and it worked:

    library(sampling)
    Xs = c(3793, 4505, 2272, 1126, 1839, 2060, 9077, 3174, 4013, 1673, 1299, 3981, 1770, 1059,  899, 2475, 1731, 2135,  843, 1880, 7887, 6402, 3022, 3345, 3954, 4489, 6222, 694)
    Xs <- as.matrix(sapply(Xs, as.numeric))
    
    # Initial Weights
    d = rep(1, nrow(Xs))
    
    # Population total
    total = c(1616772)
    
    g = calib(Xs, d, total, method="logit")
    

    Be careful though! trying different initations for d gives me very different result. i.e compare to using: d = rep(2, nrow(Xs)).