I have the following data set, which is basically a data frame with 3 columns
column_A <- rep(sample(300:1000000, 903, replace = F), each=10)
column_B <- sample(5:25, 9030, replace = T)
df <- data.frame(column_A, column_B)
df$group <- sample(1:4, nrow(df), replace = T)
rm(column_A)
rm(column_B)
and I want to generate a graph using geom.point() using the following code:
graph_builder <- function(data_set, y_axis_parameter, category, group) {
graph <- ggplot(data_set, aes(x = factor({{category}}), y = {{ y_axis_parameter }})) +
geom_point() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
facet_grid(rows = vars({{ group }}), scales = "free_x")
graph
}
graph_builder(df, column_B, column_A, group )
Working with my real data sets, similar to the generated data frame, I'm dealing with a large number of categories for the x-axis (close to 900) so the values on x-axis get cramped and not readable. I want to make my graph more readable.
My solution: I'm adding a new column to my data frame named "group" to the data frame and assigning numeric values, from 1 to 4. This assigns roughly equal number of data points into each of these four groups (1, 2, 3, and 4). But as you saw in the code, I'm adding this new column that assigns grouping outside of the grpah_builder() function.
I think there must be a better way to partition my dataframe into four (or 5) sub-groups, in such a way that the final graph has four subgraphs. I should mention that in my real data frames the values on the x-axis do not follow a uniform distribution, which makes different group sizes when using the cut()
function. Look at this solution
Question 1: Is there a way I can divide my dataset within the graph_builder() function? As you see the graph generated by my code is not readable, any solution that makes it more readable is really appreciated.
You are asking heroics of your x axis. Here's a version where I've split the chart into 6 facets in order of the category values. This is only barely readable but it's not obvious to me much better can be done without a larger format. Maybe IMAX?
Here I convert the category to a factor and convert that to a number, so num_cat
will range from 1 for the first column_A
value to 903 for the last. Then we can split the groups ~evenly with a little math.
graph_builder <- function(data_set, y_axis_parameter, category) {
data_set <- data_set %>%
mutate(num_cat = as.numeric(factor({{category}})),
group = floor(num_cat*6/max(num_cat + 1)))
ggplot(data_set, aes(x = factor({{category}}), y = {{ y_axis_parameter }})) +
geom_point() +
theme(plot.title = element_text(hjust = 0.5)) +
theme(axis.text.x = element_text(angle = 90, vjust = 0.5)) +
theme(strip.background = element_blank(), strip.text = element_blank()) +
facet_wrap(vars(group), scales = "free_x", nrow = 6)
}
graph_builder(df, column_B, column_A )