Search code examples
rnested-lists

Check for text in nested lists in R


I have a nested dataframe where one of the columns (Reviews) is a list containing lists (text, rating, date) as shown below in text format.

structure(list(Name = c("Afsondering Clinic", "The Local Choice Pharmacy Bergview"
), Reviews = list(structure(list(review_text = c("No Review Text", 
"No Review Text", "Given the poor standard of living in Eastern Cape - S.A not to mention the inefficiency in public sectors. This clinic truly thrives for excellence but must say: there is forever no medicine nor pills. Wonderful staff indeed, very helpful regardless of the state of affairs in the Makhoba village. Because of NO water NOR electricity, toilets don't flash etc 🙈. No play area for kids. Vaccines are done here."
), review_rating = c(5L, 5L, 4L), review_date = c("2020-07-03 07:12:13 +00:00", 
"2019-07-03 07:12:13 +00:00", "2019-07-03 07:12:13 +00:00")), class = "data.frame", row.names = c(NA, 
3L)), structure(list(review_text = c("Excellent service", "Went to Bergview Pharmacy looking for Liquid chlorophyll, I asked the lady who’s at the till on your way out, she’s light in complexion,had braids,her makeup done. What a rude and uncouth human being...", 
"A little on the expensive side but in general the staff that work's there are experienced and quick to answer and helpful. I will definitely recommend this pharmacy to all people.", 
"No Review Text", "Quick attendence friendly good carring"), review_rating = c(5L, 
1L, 5L, 5L, 5L), review_date = c("2024-05-03 07:12:15 +00:00", 
"2024-01-03 07:12:15 +00:00", "2023-11-03 07:12:15 +00:00", "2022-07-03 07:12:15 +00:00", 
"2021-07-03 07:12:15 +00:00")), class = "data.frame", row.names = c(NA, 
11L)))), row.names = 1:2, class = "data.frame")

I want to check if, for any place, there exists at least one review that is not "No Review Text" and then filter the dataframe to contain only those places. I am struggling to access these 'review_text' elements without using sapply. How could I get direct access to these so I can do this check? Here is the code I want to use it in:

facilities_with_reviews <- review_data %>% 
  filter(!is.na(Information$rating)) %>%
  filter(path to review_text != "No Review Text")

PS I have attempted using plyr $ syntax and normal [] but I can't get it to work


Solution

  • I think the relevant tidyverse trick is to use rowwise grouping with mutate as the data.frames are nested into the rowlevel of the given data. You might go a different route and unnest , but if you keep the nesting you can still extract relevant info and do all kinds of operations example :

    mutate(rowwise(some_data),
      reviews_df_unique_review_text_count = length(unique(Reviews$review_text)),
      reviews_count_after_exclude_no_review_text = length(setdiff(unique(Reviews$review_text), "No Review Text")),
      list_of_review_texts = list(setdiff(unique(Reviews$review_text), "No Review Text")),
      review_texts_pasted_together = paste(setdiff(unique(Reviews$review_text), "No Review Text"), collapse = "; ")
    )