Search code examples
rfrequency

Using R, is it possible to get a frequency table where no data exist?


I am producing some demographic tables, to include race, sex, and ethnicity. One of the tables is a crosstab of sex and race by ethnicity (Hispanic / not Hispanic). So far, there are no Hispanic participants in the study, but the table needs to be produced and sent to interested parties (i.e., regulatory agencies).

However, I have not been able to produce a table for the report. Obviously, the table would be all zeroes, but it is not being produced at all. It seems that this is a limitation of trying to calculate something that does not exist...

I have included example data below:


race.in <- read.table(
text = "race eth sex
b   n   f
b   n   f
b   n   f
w   n   f
w   n   m
w   n   m
a   n   m
a   n   m
a   n   f
ai  n   m
ai  n   f
ai  n   m", header = TRUE)

attach(race.in)

race.levels <- c("b", "w", "a", "ai", "nh") 
eth.levels  <- c("h", "n")  # hispanic , not hispanic
sex.levels  <- c("m", "f")


#  this table is fine
table(factor(race, levels = race.levels), factor(sex, levels = sex.levels) )

#  this table is fine
table(factor(eth, levels = eth.levels), factor(sex, levels = sex.levels) )

#  table of race and ethnicity by sex
by(race.in, sex, FUN = function(X)  table(factor(race, levels = race.levels), factor(eth, levels = eth.levels) ))  

#  produces NULL for table for levels of "h"
by(race.in, factor(eth, levels = eth.levels), FUN = function(X)  table(factor(race, levels = race.levels), factor(sex, levels = sex.levels) ))

Is there any way to produce a table of zeroes? I know it's silly, but we have to report this, even though there is no data for this set of conditions...


Solution

  • I'm not clear why you don't just factor your variables in your data.frame. That makes creating tables much easier.

    race.in$race <- factor(race.in$race, race.levels)
    race.in$eth <- factor(race.in$eth, eth.levels)
    race.in$sex <- factor(race.in$sex, sex.levels)
    table(race.in)
    table(race.in[c(1, 3, 2)])
    # , , eth = h
    # 
    #     sex
    # race m f
    #   b  0 0
    #   w  0 0
    #   a  0 0
    #   ai 0 0
    #   nh 0 0
    # 
    # , , eth = n
    # 
    #     sex
    # race m f
    #   b  0 3
    #   w  2 1
    #   a  2 1
    #   ai 2 1
    #   nh 0 0
    

    You may also be interested in exploring the ftable function (for "flat" tables). For example:

    > ftable(x=race.in, row.vars=1, col.vars=2:3)
         eth h   n  
         sex m f m f
    race            
    b        0 0 0 3
    w        0 0 2 1
    a        0 0 2 1
    ai       0 0 2 1
    nh       0 0 0 0