I have a database that comes from a survey, and from this database I constructed a dataframe in R, that looks similar to this:
cnt <-as.factor(c("Country 1", "Country 2", "Country 3", "Country 1", "Country 2", "Country 3" ))
bnk <-as.factor(c("bank 1", "bank 2", "bank 3", "bank 1", "bank 2", "bank 3" ))
qst <-as.factor(c("q1", "q1", "q1", "q2","q2","q2" ))
ans <-as.numeric(c(1,1,2,1,2,3))
df <-data.frame(cnt, bnk, qst,ans)
names(df) <- c("Country", "Institute", "Question", "Answer")
Country Institute Question Answer
1 Country 1 bank 1 q1 1
2 Country 2 bank 2 q1 1
3 Country 3 bank 3 q1 2
4 Country 1 bank 1 q2 1
5 Country 2 bank 2 q2 2
6 Country 3 bank 3 q2 3
It essentially this dataframe it shows that there two different questions - q1,q2, where the participants - here banks coming from different countries - have to respond in each question with a certain numeric scale.
My purpose is very simple. I want, for each question, to calculate and then plot the percentage of banks responded with 1, the percentage of them responded with 2, etc.
So, in our example, there are three banks. With regards to question 1, 2 of them answered 1 and one answered 2. So, I want to visualize - e.g through a bar chart - that there are 2/3 banks (i.e aprx. 67%) answered 1 and 1/3 (i.e aprx. 33% ) answered 2. Similarly for question 2.
Not sure, whether it matters but the range of possible numeric answers might vary according to the question. That is, for q1 the available answers range from 1 to 2, but for question 2 might range from 1 - 5.
Can someone suggest how I can quickly implement this in R ?
Of course, one dirty way is to count the number of banks, count the number of "ones" in q1 (q2) and then calculate the respective fractions. This method, however, is very time consuming and wondering whether are much better options available in R.
UPDATE
Doing all the above, I want for a couple of questions to create a bar chart that can look like this:
Where in the above example, the responses to question 8 that where equal to 1 were labeled - "My bank has being ...." and the responses that were equal to 2 with "My bank has being started ..." as the chart above shows.
Nevertheless, we can ignore the "labeling part" for the moment, as putting only 1 and 2 in the x axis will be sufficient.
Here's a quick answer with ggplot
library(ggplot2)
ggplot(df, aes(x=Question, fill=factor(Answer))) + geom_bar()
The output look like this:
To calculate the percentage:
library(dplyr)
library(tidyr)
(dat <- df %>% spread(Question, Answer))
Country Institute q1 q2
1 Country 1 bank 1 1 1
2 Country 2 bank 2 1 2
3 Country 3 bank 3 2 3
dat$q1 %>% table/nrow(dat)
1 2
0.6666667 0.3333333
dat$q2 %>% table/nrow(dat)
1 2 3
0.3333333 0.3333333 0.3333333
Edit: Added plot to for the comment below
ggplot(df, aes(x=Answer, fill=factor(Question))) + geom_bar()
Edit: Added to address the Updated question:
df <- data.frame(answer=c(rep(1, 97), rep(2,3)))
ggplot(df, aes(x=as.factor(answer))) +
geom_bar(aes(y=(..count..)/sum(..count..)), width=.5) +
scale_y_continuous(labels = scales::percent) +
geom_text(aes(y = ((..count..)/sum(..count..)), label = scales::percent((..count..)/sum(..count..))), stat = "count", vjust = -0.25) +
labs(title = "Question 8", y = "Percent", x = "") +
scale_x_discrete(labels=c("My bank has been using \n guarantees already for \n more than 5 years", "My bank has started to use \n guarantees in their last 5 year"))