I am running a summary of my data and when I do summary(sn$Gender) I get:
Length: 202 (I have 202 responses) Class: Character Mode: Character
It should say female 99 and male 103. Any thoughts on why this is happening?
The reason is based on the methods
for summary
and which method is getting called.
methods('summary')
#[1] summary.aov summary.aovlist* summary.aspell*
#[4] summary.check_packages_in_dir* summary.connection summary.data.frame
#[7] summary.Date summary.default summary.ecdf*
#[10] summary.factor summary.glm summary.infl*
#[13] summary.lm summary.loess* summary.manova
#[16] summary.matrix summary.mlm* summary.nls*
#[19] summary.packageStatus* summary.PDF_Dictionary* summary.PDF_Stream*
#[22] summary.POSIXct summary.POSIXlt summary.ppr*
#[25] summary.prcomp* summary.princomp* summary.proc_time
#[28] summary.srcfile summary.srcref summary.stepfun
#[31] summary.stl* summary.table summary.tukeysmooth*
Usually on a factor
class, summary.factor
is called, but if it is character
, it calls the summary.default
and based on the conditions in summary.default
if (is.factor(object))
return(summary.factor(object, ...))
.
.
.
else if (is.recursive(object) && !is.language(object) &&
(n <- length(object))) {
sumry <- array("", c(n, 3L), list(names(object), c("Length",
"Class", "Mode")))
.
.
else c(Length = length(object), Class = class(object), Mode = mode(object))
.
.
it returns the 'Length', 'Class', and 'Mode'.
One option is to either convert the column specifically to factor
and then use the summary
or call summary.factor
class(sn$Gender)
#[1] "character"
summary(sn$Gender)
#Length Class Mode
# 202 character character
summary.factor(sn$Gender)
# female male
# 93 109
But, we can avoid this confusion and use table(sn$Gender)
set.seed(24)
sn <- data.frame(Gender = sample(c('male', 'female'), 202,
replace = TRUE), stringsAsFactors = FALSE)