I have different csv file which I'm reading like this:
files <- list.files("D:/...", pattern = "L01")
for (x in files) {
(assign(x, read.csv(x, head=TRUE,, sep=",", skip= 92)))
}
What I would like to achieve next is to split (assign the factors) the files according to a column named "Case" and plot for each of these "Case" all the mean value of the remaining column in a bar plot. So at the end If I have 2 files, 50 factors and 26 column I will get 100 plot with 26 bar in it.
so I will need for each file something like,
Cases <- factor(x$Cases)
But for each file and then 1 plot for each factor with 26 bar.
Hope this is clear.
Thanks for any suggestion.
E.g. for each file I have
AAA col1 col2 col3 ....
AAA
BBB
BBB
CCC
CCC
DDD
DDD
EEE
EEE
AAA
AAA
BBB
BBB
CCC
CCC
DDD
DDD
EEE
EEE
So the factors are AAA
, BBB
, CCC
, DDD
, EEE
. I need to plot the mean of each column of these factor for each file.
Thanks for support.
Your question is not worded very clearly, but something like this might get you started:
# First, some sample data
set.seed(1)
df = data.frame(Cases = sample(LETTERS[1:5], 20, replace=TRUE),
Set1 = sample(4:10, 20, replace=TRUE),
Set2 = sample(6:19, 20, replace=TRUE),
Set3 = sample(1:20, 20, replace=TRUE),
Set4 = sample(5:16, 20, replace=TRUE))
# Use aggregate to find means by group
temp = aggregate(df[-1], by=list(df$Cases), mean)
# Plot
# par(mfrow=c(2, 2)) # Just for demonstration; used for the attached image
lapply(temp[-1], barplot, names.arg = temp$Group.1)
dev.off() # Reset the graphics device if you've changed par.
This gives you something like the following:
After reading your question again, I think that I misunderstood how you wanted to do your groupings. The following uses apply
to plot by rows instead of columns.
par(mfrow=c(2, 3)) # Just for demonstration
apply(temp[-1], 1, barplot)
dev.off() # Reset the graphics device
If you want to combine some of the factors, I would suggest creating a new factor variable before splitting. So, for instance, if you wanted to split by "A+B", "C", "D", and "E" (four groups instead of five), you can do something like the following:
# Create a new factor variable
df$Cases_2 = df$Cases # So you don't overwrite your original data
levels(df$Cases_2) <- ifelse(levels(df$Cases_2) %in% c("A","B"),
"AB", levels(df$Cases_2))
# Proceed almost as before
temp = aggregate(df[-c(1, 6)], by=list(df$Cases_2), mean)
apply(temp[-1], 1, barplot)