Search code examples
rr-xlsx

R generate multiple Excel files from a dataset, based on conditions from another


I've got a dataset with feedback comments on multiple criteria from a customer survey conducted on many sites, where each row represents a single response.

For simplicity's sake, I have simplified the original dataset and produced a reproducible dataframe with comments for only three sites.

The criteria are listed from columns 4 - 10.

comments = data.frame(RESPONDENT_ID=c(1,2,3,4,5,6,7,8),
             REGION=c("ASIA","ASIA","ASIA","ASIA","ASIA","EUROPE","EUROPE","EUROPE"),
             SITE=c("Tokyo Center","Tokyo Center","Tokyo Center","PB Tower","PB Tower","Rome Heights","Rome Heights","Rome Heights"),
             Lighting=c("Dim needs to be better","","Good","I don't like it","Could be better","","",""),
             Cleanliness=c("","very clean I'm happy","great work","","disappointed","I like the work","","nice"),
             Hygiene=c("","happy","needs improvement","great","poor not happy","nice!!","clean as usual i'm never disappointed",""),
             Service=c("great service","impressed","could do better","","","need to see more","cant say","meh"),
             Punctuality=c("always on time","","loving it","proper and respectful","","","punctual as always","delays all the time!"),
             Efficiency=c("generally efficient","never","cannot comment","","","","","happy with this"),
             Motivation=c("always very motivated","driven","exceeds expectations","","poor service","ok can do better","hmm","motivated"))

I've got a second dataset, which contains the bottom 3 scoring criteria for each of the three sites.

bottom = data.frame(REGION=c("ASIA","ASIA","EUROPE"),
             SITE=c("Tokyo Center","PB Tower","Rome Heights"),
             BOTTOM_1=c("Lighting","Cleanliness","Motivation"),
             BOTTOM_2=c("Hygiene","Service","Lighting"),
             BOTTOM_3=c("Motivation","Punctuality","Cleanliness"))                 

My Objective:

1) From the comments dataframe, for each SITE, I'd like to filter the bottom dataframe, and extract the comments for the bottom 3 criteria per site only.

2) Based on this extraction, for each unique SITE, I'd like to create an Excel file with three sheets, each sheet named after the bottom 3 criteria for that given site.

3) Each Sheet would contain a list of comments extracted for that particular site.

4) I'd like all Excel files saved in the format:

REGION_SITE_Comments2017.xlsx

Desired Final Output:

3 Excel files (or as many files as there are unique sites), each Excel file having three tabs named after their bottom 3 criteria, and each sheet with a list of comments corresponding to the given criterion for that site.

So as an example, one of the three files generated would look like this:

  • The file name would be ASIA_TokyoCenter_Comments2017.xlsx
  • The file would contain 3 sheets, "Lighting","Hygiene" & "Motivation" (based on the three bottom criteria for this site)
  • Each of these sheets would contain their respective site-level comments.

My Methodology:

I tried using a for loop on the comments dataframe, and filtering the bottom dataframe for each site listed.

Then using the write.xlsx function from the xlsx package to generate the Excel files, with the sheetName argument set to each of the bottom three citeria per site.

However I cannot seem to get the desired results. I have searched on Stackoverflow for similar solutions, but haven't found anything yet.

Any help with this would be highly appreciated!


Solution

  • This probably can be formatted better... But for each level in Region and Site, for each 'bottom', we extract each independent combination and write to file.

    bottom <- sapply(bottom, as.character) # Get out of factors.
    sp <- split(comments, comments$REGION) # Split data into a list format for ease.
    for(i in unique(bottom[,1])){
       for(j in unique(bottom[,2])){
           x <- sp[[1]][sp[[i]][,3]==j,]
           y <-  x[,colnames(x)%in%bottom[bottom[,1]==i& bottom[,2]==j,3:5]]
           for(q in colnames(y)){
           if(nrow(x) > 0) {
             write.xlsx(x=y[,q],
                        file=paste(i,j, 'Comments2017.xlsx', sep='_'),
                        sheetName=q, append=T)
           }
         }
       }
     }
    

    Is this what you were looking for?