I'm building a table based on gender data, which has three categorical values and a great deal of missing data. I converted the NAs to "Missing" to create a fourth categorical value. I want to arrange the table so that it sorts the categories from largest to smallest values, with the "Missing" category always on the bottom.
However, since this table will feed from a regularly updated dataset, the order of categories based on their values may change, and I always want whichever non-missing data category that has the largest value to be shown first, and the missing category to always be last. This makes using the slice()
function to arrange by index numbers not viable.
The code below produces a dataframe summarizing the categorical values, with "Missing" on top, as it is by default the largest category:
gender<-data.frame(peopledata$`Gender`)%>%
rename(`gender`=`peopledata.Gender`)%>%
mutate(`gender`= replace_na(`gender`,"Missing"))%>%
table()%>%
as.data.frame()%>%
rename(`count`=`Freq`)%>%
arrange(desc(`count`))
-----------------------------------------------------------------------------------------------
Output:
gender | count |
---|---|
Missing | 293 |
Male | 28 |
Female | 14 |
Undisclosed | 1 |
-----------------------------------------------------------------------------------------------
Input data (reproducing the summarized table from the gender data, for simplicity's sake):
peopledata<-data.frame(gender = c("Missing", "Male", "Female", "Undisclosed"), count= c(293, 28, 14, 1))
-----------------------------------------------------------------------------------------------
My question is focused on the arrange(desc(`count`))
line, and how to alter it so that the "Missing" row is always down on the bottom, and to otherwise sort the rest of the rows by the "count" column. Is there a conditional call l I can feed into arrange() to facilitate that?
You could move the Missing
row after sorting by count
.
library(dplyr)
peopledata |>
arrange(desc(count)) |>
slice(order(gender == 'Missing'))
#> gender count
#> 1 Male 28
#> 2 Female 14
#> 3 Undisclosed 1
#> 4 Missing 293
Or even simpler
peopledata |>
arrange(gender == 'Missing', desc(count))