Search code examples
rdataframeplotsapply

Grouping of Stem- Leaf plot in R


I have a got a stem-leaf plot as a question, which looks like this:

0 | 6
1 | 179
2 | 26
3 | 2478
4 | 15699
5 | 368
6 | 24457
7 | 
8 | 56

So, created a vector on my own which would create a stem plot as same as the above one.

data <- c(06,11,17,19,22,26,32,34,37,38,41,45,46,49,49,53,56,58,62,64,64,65,67,7,85,86)

What I have to do is that, I need to group the stems by 2 and then plot the corresponding stem plot of that using R.

The solution would look somewhat like this:

0-2|6*179*26
3-5|2478*15699*368
6-8|244457**56

"*" is used to separate the leaves of each stem in the group. i.e. For the group stem 0-2, it represents that leaf 6 in the first row corresponds to stem 0; leaves 1,7 and 9 correspond to stem 1 and leaves 2 and 6 corresponds to stem 2.

I found no use in stem(), so thought of separating the data by 2 using "interval" function, and then build one user-defined functions but it resulted out in giving me the same stem value.

Is there any way to get the desired solution, either by using in-built function/ by user-defined? Thanks a lot in advance.!!


Solution

  • This isn't going to win any beauty contest, but you can definitely use a combination of cut and some string processing to create your own grouped stem function.

    Here's an example function, commented so you can extend it to suit your actual needs:

    grouped_stem <- function(invec, n = 3) {
      # Sequence of lowest tens and highest tens in the input data, by 10
      cuts <- seq((min(invec) %/% 10) * 10, round(max(invec), -(nchar(max(invec))-1)), 10)
      # For pretty labels in `cut`
      labs <- sub("(.*).$", "\\1", cuts)
      labs <- replace(labs, !nzchar(labs), "0")
      # List of the values according to their `cut` intervals
      temp <- split(invec, cut(invec, cuts, labs[-length(labs)], right = FALSE))
      # Only interested in the last digit
      temp <- relist(sub(".*(.)$", "\\1", unlist(temp, use.names = FALSE)), temp)
      # Paste the values together. Add in a "*" that we can get rid of later if not required
      combined <- vapply(temp, function(y) sprintf("%s*", paste(y, collapse = "")), character(1L))
      # Split by number of groups of tens per stem
      splits <- split(combined, ((seq_along(combined)-1) %/% n))
      # Construct the stems and leaves
      stems <- vapply(splits, function(x) {
        paste(names(x)[1], names(x)[length(x)], sep = " to ")
      }, character(1L))
      leaves <- vapply(splits, function(x) {
        sub("[*]$", "", paste(x, sep = "", collapse = ""))
      }, character(1L))
      # Print and store
      cat(sprintf(sprintf("%%%ss | %%s", max(nchar(stems))+2), stems, leaves), sep = "\n")
      invisible(setNames(as.list(leaves), stems))
    }
    

    Run on your sample data, it produces:

    grouped_stem(data)
    ##   0 to 2 | 67*179*26
    ##   3 to 5 | 2478*15699*368
    ##   6 to 8 | 24457**56