I'm trying out discrepancy analysis. Due to the large size of my sequence data I'm using the weights with the WeightedCluster package. Everything works smoothly until the point when I get to the actual dissassoc()
part. I don't seem to be able to find my group variables.
I've tried closely following the examples from the WeightedCluster manual and Studer et al.'s article from 2011. This post is useful and has helped me forward How to use discrepancy analysis with TraMineR and aggregated sequence data?, but I cannot figure out how to get from there to finding those separate group variables in the dissassoc()
argument. Let's say I'm using the same example data (although my original data doesn't have sampling weights), but I can only use aggregate data:
## Aggregate example data
mvad.agg <- wcAggregateCases(mvad[, c(10:12, 17:86)], weights=mvad$weight)
mvad.agg
## Define sequence object
mvad.agg.seq <- seqdef(mvad[mvad.agg$aggIndex, 17:86], alphabet=mvad.alphabet,
states=mvad.scodes, labels=mvad.labels,
weights=mvad.agg$aggWeights)
## Computing OM dissimilarities
mvad.agg.dist <- seqdist(mvad.agg.seq, method="OM", indel=1.5, sm="CONSTANT")
## Discrepancy analysis
dissassoc (mvad.agg.dist, group = mvad$gcse5eq, weights = mvad.agg$aggWeights, weight.permutation = "replicate")
So in the last step, I cannot figure out how to link to the group variable. I've tried using different options to define the group (e.g., mvad.agg$gcse5eq
, mvad$gcse5eq
) and many variations of disaggregating/aggregating and weighting/unweighting the data, but I either get "Object gcse5eq not found" or "Error in diss[!is.na(group), !is.na(group)] : incorrect number of dimensions"
I'm new to SO, so hopefully my example is clear and useful. I hope someone can help!
First you need to include your covariate in the table provided to wcAggregateCases
. (Here gcse5eq
is column 12 of mvad
and already belongs to mvad[, c(10:12, 17:86)]
.)
Then, you have to provide as group
variable the values of the covariate corresponding to the cases selected by wcAggregateCases
. You do that by means of the $aggIndex
. I illustrate below:
library(TraMineR)
library(WeightedCluster)
## Load example data and assign labels
data(mvad)
mvad.alphabet <- c("employment", "FE", "HE", "joblessness", "school", "training")
mvad.labels <- c("Employment", "Further Education", "Higher Education",
"Joblessness", "School", "Training")
mvad.scodes <- c("EM", "FE", "HE", "JL", "SC", "TR")
## Aggregate example data
mvad.agg <- wcAggregateCases(mvad[, c(10:12, 17:86)], weights=mvad$weight)
## Define the sequence object
mvad.agg.seq <- seqdef(mvad[mvad.agg$aggIndex, 17:86], alphabet=mvad.alphabet,
states=mvad.scodes, labels=mvad.labels,
weights=mvad.agg$aggWeights)
## Computing OM dissimilarities
mvad.agg.dist <- seqdist(mvad.agg.seq, method="OM", indel=1.5, sm="CONSTANT")
## Discrepancy analysis
dissassoc (mvad.agg.dist, group = mvad$gcse5eq[mvad.agg$aggIndex],
weights = mvad.agg$aggWeights,
weight.permutation = "random-sampling")
Note that I use here weight.permutation = "random-sampling"
because we have non-integer weights.