Search code examples
rplotsplitr-factor

R multiple file "split and plot"


I have different csv file which I'm reading like this:

files <- list.files("D:/...", pattern = "L01")
for (x in files) {
  (assign(x, read.csv(x, head=TRUE,, sep=",", skip= 92)))
}

What I would like to achieve next is to split (assign the factors) the files according to a column named "Case" and plot for each of these "Case" all the mean value of the remaining column in a bar plot. So at the end If I have 2 files, 50 factors and 26 column I will get 100 plot with 26 bar in it.

so I will need for each file something like,

Cases  <- factor(x$Cases)

But for each file and then 1 plot for each factor with 26 bar.

Hope this is clear.

Thanks for any suggestion.

E.g. for each file I have

AAA  col1   col2  col3   ....  
AAA             
BBB  
BBB         
CCC  
CCC    
DDD  
DDD    
EEE  
EEE    
AAA  
AAA     
BBB  
BBB      
CCC  
CCC    
DDD  
DDD    
EEE  
EEE    

So the factors are AAA, BBB, CCC, DDD, EEE. I need to plot the mean of each column of these factor for each file.

Thanks for support.


Solution

  • Your question is not worded very clearly, but something like this might get you started:

    # First, some sample data
    set.seed(1)
    df = data.frame(Cases = sample(LETTERS[1:5], 20, replace=TRUE),
                    Set1 = sample(4:10, 20, replace=TRUE),
                    Set2 = sample(6:19, 20, replace=TRUE),
                    Set3 = sample(1:20, 20, replace=TRUE),
                    Set4 = sample(5:16, 20, replace=TRUE))
    
    # Use aggregate to find means by group
    temp = aggregate(df[-1], by=list(df$Cases), mean)
    
    # Plot
    # par(mfrow=c(2, 2)) # Just for demonstration; used for the attached image
    lapply(temp[-1], barplot, names.arg = temp$Group.1)
    dev.off() # Reset the graphics device if you've changed par.
    

    This gives you something like the following:

    enter image description here

    Update

    After reading your question again, I think that I misunderstood how you wanted to do your groupings. The following uses apply to plot by rows instead of columns.

    par(mfrow=c(2, 3)) # Just for demonstration 
    apply(temp[-1], 1, barplot)
    dev.off() # Reset the graphics device
    

    enter image description here

    Update [to answer some of the questions in comments]

    If you want to combine some of the factors, I would suggest creating a new factor variable before splitting. So, for instance, if you wanted to split by "A+B", "C", "D", and "E" (four groups instead of five), you can do something like the following:

    # Create a new factor variable
    df$Cases_2 = df$Cases # So you don't overwrite your original data
    levels(df$Cases_2) <- ifelse(levels(df$Cases_2) %in% c("A","B"),
                                 "AB", levels(df$Cases_2))
    # Proceed almost as before
    temp = aggregate(df[-c(1, 6)], by=list(df$Cases_2), mean)
    apply(temp[-1], 1, barplot)