Search code examples
rsummary

Summary of gender


I am running a summary of my data and when I do summary(sn$Gender) I get:

Length: 202 (I have 202 responses) Class: Character Mode: Character

It should say female 99 and male 103. Any thoughts on why this is happening?


Solution

  • The reason is based on the methods for summary and which method is getting called.

    methods('summary')
    #[1] summary.aov                    summary.aovlist*               summary.aspell*               
    #[4] summary.check_packages_in_dir* summary.connection             summary.data.frame            
    #[7] summary.Date                   summary.default                summary.ecdf*                 
    #[10] summary.factor                 summary.glm                    summary.infl*                 
    #[13] summary.lm                     summary.loess*                 summary.manova                
    #[16] summary.matrix                 summary.mlm*                   summary.nls*                  
    #[19] summary.packageStatus*         summary.PDF_Dictionary*        summary.PDF_Stream*           
    #[22] summary.POSIXct                summary.POSIXlt                summary.ppr*                  
    #[25] summary.prcomp*                summary.princomp*              summary.proc_time             
    #[28] summary.srcfile                summary.srcref                 summary.stepfun               
    #[31] summary.stl*                   summary.table                  summary.tukeysmooth*   
    

    Usually on a factor class, summary.factor is called, but if it is character, it calls the summary.default and based on the conditions in summary.default

     if (is.factor(object)) 
        return(summary.factor(object, ...))
     .
     .
     .
    
     else if (is.recursive(object) && !is.language(object) && 
           (n <- length(object))) {
         sumry <- array("", c(n, 3L), list(names(object), c("Length", 
             "Class", "Mode")))
     .
     .
    
      else c(Length = length(object), Class = class(object), Mode = mode(object))
     .
     .
    

    it returns the 'Length', 'Class', and 'Mode'.

    One option is to either convert the column specifically to factor and then use the summary or call summary.factor

    class(sn$Gender)
    #[1] "character"
    
    summary(sn$Gender)
    #Length     Class      Mode 
    #  202 character character 
    
    
    summary.factor(sn$Gender)
    # female   male 
    #   93    109 
    

    But, we can avoid this confusion and use table(sn$Gender)

    data

    set.seed(24)
    sn <- data.frame(Gender = sample(c('male', 'female'), 202, 
                          replace = TRUE), stringsAsFactors = FALSE)