Excel Data would contain 36 Factors (basically yes or no Questions) collected from users. Based on this question is there any way to run cluster analysis? I tried using iris example as reference, but as my data is completely text base, trying to figure out a way.
The date would be like:
Q 1 Q 2 Q 3 Q 4 Q 5
People 1 Yes Yes Yes Yes Yes
People 2 No Yes No Yes No
People 3 No No No No No
People 4 Yes No Yes No Yes
People 5 No Yes No Yes No
People 6 Yes No Yes No Yes
People 7 No Yes No Yes No
as I reffer to online blogs, Crossvalidated Stackexchange or other resources for the factor analysis, I am showing here an approach, how to get your data numeric.
Here is how I reproduced your data:
library(tidyverse)
df <- read_table("Person ID Q1 Q2 Q3 Q4 Q5
People 1 Yes Yes Yes Yes Yes
People 2 No Yes No Yes No
People 3 No No No No No
People 4 Yes No Yes No Yes
People 5 No Yes No Yes No
People 6 Yes No Yes No Yes
People 7 No Yes No Yes No") %>%
unite("PersonID", Person, ID, sep = "")
Now your need to swap the text to factors and than to numeric data.
df %>%
mutate_if(grepl("Q", names(.)), as.factor) %>%
mutate_if(is.factor, as.numeric)
Output is:
# A tibble: 7 x 6
PersonID Q1 Q2 Q3 Q4 Q5
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 People1 2 2 2 2 2
2 People2 1 2 1 2 1
3 People3 1 1 1 1 1
4 People4 2 1 2 1 2
5 People5 1 2 1 2 1
6 People6 2 1 2 1 2
7 People7 1 2 1 2 1
Now you can perform a correlation, which you might need for your factor analysis:
df %>%
select(-1) %>%
cor()
Hope that approach helps.