Search code examples
rstatnet

Network package changes behaviour of summary for character vectors, breaking summary.data.frame (all values are printed, preceded by NULL: )


After loading the network package, I have an issue with the summary.data.frame function: if a column of class "character" is present, instead of the usual output, summary will print the values from all rows, prepended by NULL:. Here's a toy example:

test <- data.frame(a=c("some", "char", "vector", "with", 
                       "many", "many", "words"),
                   b=1:7, stringsAsFactors = FALSE)

# Expected behaviour

summary(test$a)

##    Length     Class      Mode 
##         7 character character

summary(test)

##       a                   b      
##  Length:7           Min.   :1.0  
##  Class :character   1st Qu.:2.5  
##  Mode  :character   Median :4.0  
##                     Mean   :4.0  
##                     3rd Qu.:5.5  
##                     Max.   :7.0

library("network")

## network: Classes for Relational Data
## Version 1.13.0 created on 2015-08-31.
## ...

# Behavior after loading network:

summary(test$a)

##   char   many   some vector   with  words 
##      1      2      1      1      1      1

summary(test)

##     a                b      
##  NULL:some     Min.   :1.0  
##  NULL:char     1st Qu.:2.5  
##  NULL:vector   Median :4.0  
##  NULL:with     Mean   :4.0  
##  NULL:many     3rd Qu.:5.5  
##  NULL:many     Max.   :7.0  
##  NULL:words

Note that the output includes all elements of the character vector, including repetitions, so you get 1000 lines of summary for 1000 rows, which renders the summary function unusable. This behavior stays after detaching the network package, until restart of a new R session.

What goes wrong: normally UseMethod("summary") for character vectors calls summary.default, which produces the normal output, which has names.

summary.default(test$a)

##    Length     Class      Mode 
##         7 character character

names(summary.default(test$a))

## [1] "Length" "Class"  "Mode"

The network package defines a summary.character function, which simply adds a "summary.character" class to the character object, such that its print calls network::print.summary.character, which produces the table with up to 10 most frequent values. The object itself is unchanged, so its names is NULL.

summary.character

## function (object, ...) 
## {
##     class(object) <- c("summary.character", class(object))
##     object
## }
## <environment: namespace:network>

summary.character(test$a)

##   char   many   some vector   with  words 
##      1      2      1      1      1      1

names(summary.character(test$a))

## NULL

class(summary.character(test$a))

## [1] "summary.character" "character"

length(summary.character(test$a))

## [1] 7

as.character(summary.character(test$a))

## [1] "some"   "char"   "vector" "with"   "many"   "many"   "words"

The trouble comes from these three lines in summary.data.frame:

        sms <- format(sms, digits = digits)
        lbs <- format(names(sms))
        sms <- paste0(lbs, ":", sms, "  ")

It's inside a for loop over columns, where sms is the output of summary for the current column. For the output of summary.character, sms is actually the whole column, and names(sms) is NULL, hence the issue.

The core cause of the problem is that summary.character returns the original object, instead of its summary representation, which is delegated to print.summary.character. summary.data.frame just pastes it with the other summaries, dumping the whole column.

Any idea on how to fix this without diving into the sources of network would be very appreciated.


Solution

  • I found a turnaround this, unfortunately it involves "polluting" R namespace a bit more (to cite @steveb's comments), by defining a function format.summary.character that restores the expected behavior of the code inside summary.data.frame. The function is inspired by format.factor:

    format.summary.character <- function(x, ...) {
        s <- summary.default(as.character(x), ...)
        format(structure(as.character(s), names = names(s), dim = dim(s), 
                         dimnames = dimnames(s)), ...)
    }
    

    After defining this function, the output of summary for character vector is still controlled by summary.character, but the output for summary.data.frame goes back to normal.

    summary(test$a) # still calling summary.character
    
    ##   char   many   some vector   with  words 
    ##      1      2      1      1      1      1
    
    summary(test)   # back to normal
    
    ##       a                   b      
    ##  Length:7           Min.   :1.0  
    ##  Class :character   1st Qu.:2.5  
    ##  Mode  :character   Median :4.0  
    ##                     Mean   :4.0  
    ##                     3rd Qu.:5.5  
    ##                     Max.   :7.0  
    ##