After loading the network
package, I have an issue with the summary.data.frame
function: if a column of class "character"
is present, instead of the usual output, summary will print the values from all rows, prepended by NULL:
. Here's a toy example:
test <- data.frame(a=c("some", "char", "vector", "with",
"many", "many", "words"),
b=1:7, stringsAsFactors = FALSE)
# Expected behaviour
summary(test$a)
## Length Class Mode
## 7 character character
summary(test)
## a b
## Length:7 Min. :1.0
## Class :character 1st Qu.:2.5
## Mode :character Median :4.0
## Mean :4.0
## 3rd Qu.:5.5
## Max. :7.0
library("network")
## network: Classes for Relational Data
## Version 1.13.0 created on 2015-08-31.
## ...
# Behavior after loading network:
summary(test$a)
## char many some vector with words
## 1 2 1 1 1 1
summary(test)
## a b
## NULL:some Min. :1.0
## NULL:char 1st Qu.:2.5
## NULL:vector Median :4.0
## NULL:with Mean :4.0
## NULL:many 3rd Qu.:5.5
## NULL:many Max. :7.0
## NULL:words
Note that the output includes all elements of the character vector, including repetitions, so you get 1000 lines of summary for 1000 rows, which renders the summary function unusable. This behavior stays after detaching the network package, until restart of a new R session.
What goes wrong: normally UseMethod("summary")
for character vectors calls summary.default
, which produces the normal output, which has names
.
summary.default(test$a)
## Length Class Mode
## 7 character character
names(summary.default(test$a))
## [1] "Length" "Class" "Mode"
The network package defines a summary.character
function, which simply adds a "summary.character"
class to the character object, such that its print calls network::print.summary.character
, which produces the table with up to 10
most frequent values. The object itself is unchanged, so its names
is NULL
.
summary.character
## function (object, ...)
## {
## class(object) <- c("summary.character", class(object))
## object
## }
## <environment: namespace:network>
summary.character(test$a)
## char many some vector with words
## 1 2 1 1 1 1
names(summary.character(test$a))
## NULL
class(summary.character(test$a))
## [1] "summary.character" "character"
length(summary.character(test$a))
## [1] 7
as.character(summary.character(test$a))
## [1] "some" "char" "vector" "with" "many" "many" "words"
The trouble comes from these three lines in summary.data.frame
:
sms <- format(sms, digits = digits)
lbs <- format(names(sms))
sms <- paste0(lbs, ":", sms, " ")
It's inside a for
loop over columns, where sms
is the output of summary
for the current column. For the output of summary.character
, sms
is actually the whole column, and names(sms)
is NULL
, hence the issue.
The core cause of the problem is that summary.character
returns the original object, instead of its summary representation, which is delegated to print.summary.character
. summary.data.frame
just pastes it with the other summaries, dumping the whole column.
Any idea on how to fix this without diving into the sources of network
would be very appreciated.
I found a turnaround this, unfortunately it involves "polluting" R namespace a bit more (to cite @steveb's comments), by defining a function format.summary.character
that restores the expected behavior of the code inside summary.data.frame
. The function is inspired by format.factor
:
format.summary.character <- function(x, ...) {
s <- summary.default(as.character(x), ...)
format(structure(as.character(s), names = names(s), dim = dim(s),
dimnames = dimnames(s)), ...)
}
After defining this function, the output of summary for character vector is still controlled by summary.character
, but the output for summary.data.frame
goes back to normal.
summary(test$a) # still calling summary.character
## char many some vector with words
## 1 2 1 1 1 1
summary(test) # back to normal
## a b
## Length:7 Min. :1.0
## Class :character 1st Qu.:2.5
## Mode :character Median :4.0
## Mean :4.0
## 3rd Qu.:5.5
## Max. :7.0
##