Search code examples
rbenford.analysis

Properly calling variable name when creating multiple Benford plots


I am creating Benford plots for all the numeric variables in my dataset. https://en.wikipedia.org/wiki/Benford%27s_law

Running a single variable

#install.packages("benford.analysis")
library(benford.analysis)
plot(benford(iris$Sepal.Length))

Looks great. And the legend says "Dataset: iris$Sepal.Length", perfect!.

Benford 1

Using apply to run 4 variables,

apply(iris[1:4], 2, function(x) plot(benford(x)))

Creates four plots, however, each plot's legend says "Dataset: x"

Benford 2

I attempted to use a for loop,

for (i in colnames(iris[1:4])){
  plot(benford(iris[[i]]))
}

This creates four plots, but now the legends says "Dataset: iris[[i]]". And I would like the name of the variable on each chart.

Benford 3

I tried a different loop, hoping to get titles with an evaluated parsed string like "iris$Sepal.Length":

for (i in colnames(iris[1:4])){
  plot(benford(eval(parse(text=paste0("iris$", i)))))
}

But now the legend says "Dataset: eval(parse(text=paste0("iris$", i)))".

Benford 4

AND, Now I've run into the infamous eval(parse(text=paste0( (eg: How to "eval" results returned by "paste0"? and R: eval(parse(...)) is often suboptimal )

I would like labels such as "Dataset: iris$Sepal.Length" or "Dataset: Sepal.Length". How can I create multiple plots with meaningfully variable names in the legend?


Solution

  • This is happening because of the first line within the benford function=:

    benford <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3){
    
      data.name <- as.character(deparse(substitute(data)))
    

    Source: https://github.com/cran/benford.analysis/blob/master/R/functions-new.R

    data.name is then used to name your graph. Whatever variable name or expression you pass to the function will unfortunately be caught by the deparse(substitute()) call, and will be used as the name for your graph.


    One short-term solution is to copy and rewrite the function:

    #install.packages("benford.analysis")
    library(benford.analysis)
    #install.packages("data.table")
    library(data.table) # needed for function
    
    # load hidden functions into namespace - needed for function
    r <- unclass(lsf.str(envir = asNamespace("benford.analysis"), all = T))
    for(name in r) eval(parse(text=paste0(name, '<-benford.analysis:::', name)))
    
    
    benford_rev <- function{} # see below
    
    for (i in colnames(iris[1:4])){
       plot(benford_rev(iris[[i]], data.name = i))
    }
    

    enter image description here

    enter image description here

    This has negative side effects of:

    • Not being maintainable with package revisions
    • Fills your GlobalEnv with normally hidden functions in the package

    So hopefully someone can propose a better way!


    benford_rev <- function(data, number.of.digits = 2, sign = "positive", discrete=TRUE, round=3, data.name = as.character(deparse(substitute(data)))){ # changed
    
     # removed line
    
      benford.digits <- generate.benford.digits(number.of.digits)
    
      benford.dist <- generate.benford.distribution(benford.digits)
    
      empirical.distribution <- generate.empirical.distribution(data, number.of.digits,sign, second.order = FALSE, benford.digits)
    
      n <- length(empirical.distribution$data)
    
      second.order <- generate.empirical.distribution(data, number.of.digits,sign, second.order = TRUE, benford.digits, discrete = discrete, round = round)
    
      n.second.order <- length(second.order$data)
    
      benford.dist.freq <- benford.dist*n
    
      ## calculating useful summaries and differences
      difference <- empirical.distribution$dist.freq - benford.dist.freq
    
      squared.diff <- ((empirical.distribution$dist.freq - benford.dist.freq)^2)/benford.dist.freq
    
      absolute.diff <- abs(empirical.distribution$dist.freq - benford.dist.freq)
    
      ### chi-squared test
      chisq.bfd <- chisq.test.bfd(squared.diff, data.name)
    
      ### MAD
      mean.abs.dev <- sum(abs(empirical.distribution$dist - benford.dist)/(length(benford.dist)))
    
      if (number.of.digits > 3) {
        MAD.conformity <- NA
      } else {
        digits.used <- c("First Digit", "First-Two Digits", "First-Three Digits")[number.of.digits]  
        MAD.conformity <- MAD.conformity(MAD = mean.abs.dev, digits.used)$conformity
      }
    
    
    
    
    
      ### Summation
      summation <- generate.summation(benford.digits,empirical.distribution$data, empirical.distribution$data.digits)
      abs.excess.summation <- abs(summation - mean(summation))
    
      ### Mantissa
      mantissa <- extract.mantissa(empirical.distribution$data)
      mean.mantissa <- mean(mantissa)
      var.mantissa <- var(mantissa)
      ek.mantissa <- excess.kurtosis(mantissa)
      sk.mantissa <- skewness(mantissa)
    
      ### Mantissa Arc Test
      mat.bfd <- mantissa.arc.test(mantissa, data.name)
    
      ### Distortion Factor
      distortion.factor <- DF(empirical.distribution$data)  
    
      ## recovering the lines of the numbers
      if (sign == "positive") lines <- which(data > 0 & !is.na(data))
      if (sign == "negative") lines <- which(data < 0 & !is.na(data))
      if (sign == "both")     lines <- which(data != 0 & !is.na(data))
      #lines <- which(data %in% empirical.distribution$data)
    
      ## output
      output <- list(info = list(data.name = data.name,
                                 n = n,
                                 n.second.order = n.second.order,
                                 number.of.digits = number.of.digits),
    
                     data = data.table(lines.used = lines,
                                       data.used = empirical.distribution$data,
                                       data.mantissa = mantissa,
                                       data.digits = empirical.distribution$data.digits),
    
                     s.o.data = data.table(second.order = second.order$data,
                                           data.second.order.digits = second.order$data.digits),
    
                     bfd = data.table(digits = benford.digits,
                                      data.dist = empirical.distribution$dist,
                                      data.second.order.dist = second.order$dist,
                                      benford.dist = benford.dist,
                                      data.second.order.dist.freq = second.order$dist.freq,
                                      data.dist.freq = empirical.distribution$dist.freq,
                                      benford.dist.freq = benford.dist.freq,
                                      benford.so.dist.freq = benford.dist*n.second.order,
                                      data.summation = summation,
                                      abs.excess.summation = abs.excess.summation,
                                      difference = difference,
                                      squared.diff = squared.diff,
                                      absolute.diff = absolute.diff),
    
                     mantissa = data.table(statistic = c("Mean Mantissa", 
                                                         "Var Mantissa", 
                                                         "Ex. Kurtosis Mantissa",
                                                         "Skewness Mantissa"),
                                           values = c(mean.mantissa = mean.mantissa,
                                                      var.mantissa = var.mantissa,
                                                      ek.mantissa = ek.mantissa,
                                                      sk.mantissa = sk.mantissa)),
                     MAD = mean.abs.dev,
    
                     MAD.conformity = MAD.conformity,
    
                     distortion.factor = distortion.factor,
    
                     stats = list(chisq = chisq.bfd,
                                  mantissa.arc.test = mat.bfd)
      )
    
      class(output) <- "Benford"
    
      return(output)
    
    }