Search code examples
rpackageindicesentropymutual-information

Dissimilarity Index using Segregation Package


I am trying to calculate the dissimilarity index of several schools in a country using the segregation package. My dataset currently looks like this:

# A tibble: 948 × 4
   ethnicity              school                 acyear    n
   <chr>                  <chr>                  <chr>   <dbl>
 1 White                  school 1               2010/11  3245
 2 Unknown/not applicable school 1               2010/11   675
 3 Other                  school 1               2010/11     5
 4 Mixed                  school 1               2010/11    50
 5 Black                  school 1               2010/11    40
 6 Asian                  school 1               2010/11    95
 7 White                  school 2               2010/11  5905
 8 Unknown/not applicable school 2               2010/11  1060
 9 Other                  school 2               2010/11    15
10 Mixed                  school 2               2010/11   115
# … with 938 more rows

The command that I am using is - very similar to the command I used to calculate the Mutual Information Index and Theil’s Entropy Index:

dissimilarity (data,
        group = 'ethnicity',
        unit = 'school',
        weight = 'n') 

However, I am getting the following error:

Error in dissimilarity(acyear1, group = "ethnicity", unit = "school", weight = "n") : 
  The D index only allows two distinct groups 

I tried to calculate a dummy variable for ethnicity, but I am still getting the same error.

Can someone help me?

Thank you :)


Solution

  • In this case, the dissimilarity index calculation fails because by definition, the index only compares two groups to each other (in the literature, this is usually a Black-White dissimilarity index). In your data, you have 6 different race/ethnicity groups, so you can either a) calculate the index for each possible combination of race/ethnicity groups (e.g., White-Black, White-Asian, Black-Asian, etc.); b) decide one race/ethnicity to use as a reference group and collapse all other race/ethnicity categories together (e.g., White-nonWhite where non-White = Black + Asian + Mixed + Other + Unknown); or c) use a different index of segregation that is designed around having multiple race/ethnicity groups.