I am trying to build a data analytics dashboard and I am using Shiny, which I am relatively new to. One of the features of my dashboard uses k-means clustering on user generated data. I can get the clustering analysis to work fine, but I want to be able to exploratory data analysis on individual clusters once the initial cluster analysis has been done. Also, I would like to do this with reactive data frames in Shiny, so that if the user changes a value on the dash board, the analysis refreshes, including the post-clustering exploratory stuff.
Before anything, here are some functions that I use in the dashboard server code and relevant libraries, so run these first:-
#libraries===================================================================
library(ids)
library(tidyverse)
library(dplyr)
library(shiny)
library(ggplot2)
library(shinydashboard)
library(shinyWidgets)
library(factoextra)
#functions required==========================================================
#scale https://stackoverflow.com/questions/35775696/trying-to-use-dplyr-to-group-by-and-apply-scale
scale_this <- function(x){
(x - mean(x, na.rm=TRUE)) / sd(x, na.rm=TRUE)
}
#wss plot
wssplot <- function(data, nc = 15, seed = 1234) {
wss <- (nrow(data) - 1) * sum(apply(data, 2, var))
for (i in 2:nc) {
set.seed(seed)
wss[i] <- sum(kmeans(data, centers = i)$withinss)
}
plot(1:nc,
wss,
type = "b",
xlab = "Number of Clusters",
ylab = "Within groups sum of squares")
}
Here is the code for the mock data frame for this example:-
#Create my mock data frame============================================
set.seed(123)
randomid<-random_id(333)#from 'ids' library
Duration<-c(floor(runif(10000, min=1, max=1000)))
mockdf<-cbind(randomid, Duration)
mockdf<-as.data.frame(mockdf)
mockdf$Duration<-as.numeric(mockdf$Duration)
My UI code:-
#UI============================================================================
ui<-fluidPage(
titlePanel('Minimal example'),
tabsetPanel(
#=============================================kmeans clustering==================================================
tabPanel("User Type Discovery",
sidebarLayout(
sidebarPanel(width = 4,numericInput('ksolution', 'Select k solution', 5),
pickerInput('userselect', 'Which users do you want to include:',
choices = unique(mockdf$randomid), options = list('actions-box'=TRUE),multiple = T)),
mainPanel(fluidRow(
column(12, plotOutput("elbowplot")),
column(12, plotOutput("clustplot")),
column(12, plotOutput("clust_dens")),
column(12, DT::dataTableOutput('Clusterdf'))))
)
)
)
)
And my server code:-
#SERVER===========================================================
server<-function(input,output,session){
#create reactive dataframe
rval_df <-reactive({
mockdf
})
#=============================================kmeans clustering==================================================
rval_UserData<-reactive({
rval_df()%>%
filter(randomid %in% input$userselect)%>%
group_by(randomid)%>%
summarise(Count=n(),MeanDuration=mean(Duration),SDDuration=sd(Duration))%>%
mutate(SDDuration=if_else(is.na(SDDuration),0,SDDuration),
Cluster=as.factor(rval_kclust()$cluster))
})
#create a scaled dataset for the clustering
rval_cluster_df<-reactive({
rval_df()%>%
filter(randomid %in% input$userselect)%>%
group_by(randomid)%>%
summarise(Count=n(),MeanDuration=mean(Duration),SDDuration=sd(Duration))%>%
mutate(SDDuration=if_else(is.na(SDDuration),0,SDDuration),
Count=scale_this(Count),
MeanDuration=scale_this(MeanDuration),
SDDuration=scale_this(SDDuration))%>%
select(Count,MeanDuration,SDDuration)
})
#cluster algorithm
rval_kclust<-reactive({
kmeans(rval_cluster_df(), centers = input$ksolution)
})
output$clustplot<-renderPlot({
factoextra::fviz_cluster(rval_kclust(), data = rval_cluster_df())
})
output$elbowplot<-renderPlot({
wssplot(rval_cluster_df())
})
output$Clusterdf<- DT::renderDataTable({
rval_UserData()
})
}
shinyApp(ui, server)
When you run shinyApp(ui,server)
, hit the "Select All" button in the drop down box in the app to run the clustering.
Now, here is what I want to do. Since I have assigned the cluster number back onto rval_UserData()
, I want to be able to merge this assign the cluster number to mockdf
, so I can generate plots using ggplot2
on the Duration
variable and also generate summary tables, all at cluster level. I prefer to be able to do this using reactive data frames, so the plots will up refresh depending on the ksolution
input in the UI.
Here's some of my attempts to merge the cluster number back onto the mockdf
, followed by an attempt to plot a density plot:-
rval_cluster_merged_df<-reactive({
merge(mockdf(), rval_UserData(), by="randomid")
#outside of shiny, this would be a quick way to paste the cluster number back onto the mock dataframe
})
output$clust_dens<-renderPlot({
dd<-rval_cluster_merged_df()
ggplot(dd,aes(x=Duration, colour=Cluster, group=Cluster))+
geom_density()+ggtitle("Cluster density plot")+scale_x_log10()
})
And this is what I get, see the error message:-
It's probably something obvious that I am doing wrong but any pointers in the right direction would be well appreciated! Thank you in advance :)
You need to use req()
for all the input$abc
variables, and eval_tidy
as they are not standard variables. Minor update to your server function as shown below will solve your problem.
server<-function(input,output,session){
#create reactive dataframe
rval_df <-reactive({
mockdf
})
#=============================================kmeans clustering==================================================
rval_UserData<-reactive({
req(input$userselect)
userselect <- eval_tidy(input$userselect)
rval_df()%>%
filter(randomid %in% userselect)%>%
group_by(randomid)%>%
summarise(Count=n(),MeanDuration=mean(Duration),SDDuration=sd(Duration))%>%
mutate(SDDuration=if_else(is.na(SDDuration),0,SDDuration),
Cluster=as.factor(rval_kclust()$cluster))
})
#create a scaled dataset for the clustering
rval_cluster_df<-reactive({
req(input$userselect)
userselect <- eval_tidy(input$userselect)
rval_df()%>%
filter(randomid %in% userselect)%>%
group_by(randomid)%>%
summarise(Count=n(),MeanDuration=mean(Duration),SDDuration=sd(Duration))%>%
mutate(SDDuration=if_else(is.na(SDDuration),0,SDDuration),
Count=scale_this(Count),
MeanDuration=scale_this(MeanDuration),
SDDuration=scale_this(SDDuration))%>%
select(Count,MeanDuration,SDDuration)
})
#cluster algorithm
rval_kclust<-reactive({
req(input$ksolution)
centers <- as.numeric(eval_tidy(input$ksolution))
kmeans(rval_cluster_df(), centers = centers)
})
output$clustplot<-renderPlot({
factoextra::fviz_cluster(rval_kclust(), data = rval_cluster_df())
})
output$elbowplot<-renderPlot({
wssplot(rval_cluster_df())
})
output$Clusterdf<- DT::renderDataTable({
rval_UserData()
})
rval_cluster_merged_df<-reactive({
merge(rval_df(), rval_UserData(), by="randomid")
})
output$clust_dens<-renderPlot({
dd<-rval_cluster_merged_df()
ggplot(dd,aes(x=Duration, colour=Cluster, group=Cluster))+
geom_density()+ggtitle("Cluster density plot")+scale_x_log10()
})
}
Final output will be: