Search code examples
rggplot2charts

how to display factor in scientific notation


I have a dataframe that has a pretty big gap between values, that it'd be logical to use log scale on the graph. Something like this:

df <- data.frame(
  Type = c("A", "B", "C", "D"),
  Value = c(1e3, 2e3, 4e5, 8e6),
  Efficiency = c(70, 72, 80, 88)

But instead of using 'numerical' as a data type on 'Value' column, I want to use 'factor' data type, so that the graph will just have 4, equally spaced data points (ie - 1e3, 2e3, 4e5, 8e6), instead of a continuous, log-scaled x-axis which I won't be able to get it equally spaced out, or have the exact values shown for each data entry. So I have converted the Value to 'Factor'

# Convert set columns to specified data type
  factor_cols <- c("Type", "Value")
  df[factor_cols] <- lapply(df[factor_cols], factor)

The problem I have is I need to display the data on the x-axis in scientific notation, but for whatever reason, I always get the lower values in numerical and the higher values in scientific notations. code I used:

  # Plot summary graph
  ggplot(df, aes(x = Value, y = Efficiency, color = Type)) +
    geom_point(size = 3, alpha = 0.7) 

only the two larger values are shown in scientific notation

I tried using scale_x_discrete() but doesn't seem to help.

How do I make all values from the 'Value' string (i'm not even sure if we call that a string or a datatype?) in scientific notation? Also, if you were to use 'Value' in numerical format, is there a way to format the graph so the data points are equally distributed on the x-axis while also indicating the respective data values on the x-axis scale?


Solution

  • You can use format(.., scientific=TRUE) or scales::label_scientific().

    df <- structure(list(Type = c("A", "B", "C", "D"), Value = c(1000, 2000, 4e+05, 8e+06), Efficiency = c(70, 72, 80, 88)), class = "data.frame", row.names = c(NA, -4L))
    
    df |>
      transform(
        Type = factor(Type),
        Value = reorder(factor(format(Value, scientific=TRUE)), Value)
      ) |>
      ggplot(aes(x = Value, y = Efficiency, color = Type)) +
      geom_point(size = 3, alpha = 0.7)
    

    enter image description here

    Edited to include the use of reorder to ensure the levels are ordered by magnitude, not alphabetically (the default behavior of factors).