I've got a dataset which looks like this:
platform | twitter_context | facebook_context | insta_context |
---|---|---|---|
Hashtag | NA | NA | |
NA | Facebook Group | NA | |
NA | NA | Public Figure | |
NA | NA | Hashtag | |
NA | A friend | NA | |
Someone I follow | NA | NA |
… total of rows > 1600
What I would like to achieve is a bar chart which compares the frequency of the categories in those "_context" columns by "platfom".
I have used ggplot before to draw a bar chart that combines two variables. But here, the categories in those "_contexts" are similar, but not identical.
As each context column only applies to one platform, I tried to merge the three context columns in a new column using the mutate function. However, I failed to make it work properly: When I ran three mutate lines consecutively the NAs would always overwrite previous categories. I tried to solve this with if/else_if-conditions, to have only proper data pasted to the new column (and ignore those NAs). But this idea was doomed by my lack of syntactical understanding.
I suppose there must be a way to get this right, however, I couldn't do it. (Did I mention I am quite new to this?)
My intention was that I could then plot a chart using the new "all_contexts" column and split it up on the x axis by platform. (The labelling would still be a mess, but possibly that could be fixed by applying levels.)
A different approach I could imagine would be to have ggplot draw three independent bar charts which then would have to be manually standardized, unless there are ways to "concatenate" such somehow in a single plot.
Very likely this rookie problem has already been covered in a thread which I was unable to find. Can someone point me into the right direction? I appreciate your help!
There are number of ways to transform your data to prepare it for the plot that you want to create. One way is illustrated here, where we use pivot_longer()
and remove rows that are NA, and then count the number of rows by platform
and context
library(dplyr)
library(tidyr)
ggdata <- df %>%
pivot_longer(cols = ends_with('context'), names_to = "p", values_to = "context") %>%
filter(!is.na(context)) %>%
count(platform,context)
Now, you can directly pass the frame as is to ggplot()
using geom_col()
, or you could add rows for the platform/context combinations that are not represented.
Here is an example of the former approach:
library(ggplot2)
ggplot(ggdata, aes(platform, n, fill=context)) + geom_col(position = "dodge")