Search code examples
rsurveyquantile

Compute quantiles incorporating Sample Design (Survey package)


I want to compute a new column using the quantiles of another column (a continuous variable) incorporating the Sample Design of a complex survey. The idea is to create in the the data frame a new variable that indicates which quantile group each observation falls into

Here is how I execute the idea without incorporating the sample design, so you can understand what I'm aiming for.

# Load Data
  data(api)


# Convert data to data.table format (mostly to increase speed of the process)
  apiclus1 <- as.data.table(apiclus1)

# Create deciles variable
apiclus1[, decile:=cut(api00,
                       breaks=quantile(api00,
                                       probs=seq(0, 1, by=0.1), na.rm=T),
                       include.lowest= TRUE, labels=1:10)]

I've tried using svyquantile from the survey package, but I couldn't get my head around this problem. This code does not return the quantile groups as an output that I can feed into a new variable. Any thoughts on this?

# Load Package
 library(survey)

# create survey design
 dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)

# What I've tried to do
  svyquantile(~api00, design = dclus1, quantiles = seq(0, 1, by=0.1), method = "linear", ties="rounded")

Solution

  • library(survey)
    
    data(api)
    
    dclus1 <- svydesign(id=~dnum, weights=~pw, data=apiclus1, fpc=~fpc)
    
    a <- svyquantile(~api00, design = dclus1, quantiles = seq(0, 1, by=0.1), method = "linear", ties="rounded")
    
    # use factor() and findInterval()
    dclus1 <- update( dclus1 , qtile = factor( findInterval( api00 , a ) ) )
    
    # distribution
    svymean( ~ qtile , dclus1 )
    
    
    # or without the one observation in group number 11
    dclus1 <- update( dclus1 , qtile = factor( findInterval( api00 , a[ -length( a ) ] ) ) )
    
    # distribution
    svymean( ~ qtile , dclus1 )
    
    
    
    # quantiles by group
    b <- svyby(~api00, ~stype, design = dclus1, svyquantile, quantiles = seq(0, 0.9 , by=0.1) ,ci=T)
    
    # copy over your data
    x <- apiclus1
    
    # stype of each record
    match( x$stype , b$stype ) 
    
    # create the new qtile variable
    x$qtile_by_stype <- factor( diag( apply( data.frame( b )[ match( x$stype , b$stype ) , 2:11 ] , 1 , function( v , w ) findInterval( w , v ) , x$api00 ) ) )
    
    # re-create the survey design
    dclus1 <- svydesign(id=~dnum, weights=~pw, data=x, fpc=~fpc)
    
    # confirm you have quantiles
    svyby( ~ qtile_by_stype , ~ stype , dclus1 , svymean )