Search code examples
rreshapereshape2

Sequentially rename duplicate value in character variable before reshaping it with dcast


I'm scraping cars information from a website and I'm getting not constant and not so clean data from it. I'm trying to clean and arrange this data into a data frame.

For example:

dd <- data.frame(measure = c("wheel", "wheel", "length", "width", "wheel", "width"), value = 1:6, model = "a", stringsAsFactors = F)
dd
  measure value model
1   wheel     1     a
2   wheel     2     a
3  length     3     a
4   width     4     a
5   wheel     5     a
6   width     6     a

It this example, I have 3 values of wheel and 2 of width. In my real data, it's not always the same thing that is repeated, it may or may not have repetition and it could be repeated more than once.

I need to reshape this table to have one line per model, however I don't want to aggregate the value that have a common measure. Precisely, I would want the table to become:

  model length wheel wheel1 wheel2 width width1
1     a      3     1      2      5     4      6

This was obtained using dcast on manually modified data:

library(reshape2)    
res <- data.frame(measure = c("wheel", "wheel1", "length", "width", "wheel2", "width1"), value = 1:6, model = "a", stringsAsFactors = F)
dcast(res, model ~ measure)

I need either a way to modify dcast so it doesn't aggregate the measure or automatically modify dd so it becomes res.

I've tried something ugly and not exactly what I needed:

dd[duplicated(dd$measure), "measure"] <- paste0(dd[duplicated(dd$measure), "measure"] , 1:3)
dd
  measure value model
1   wheel     1     a
2  wheel1     2     a
3  length     3     a
4   width     4     a
5  wheel2     5     a
6  width3     6     a

This code isn't working because width get the index 3 and not 2. Also, this would not adjust to another table like:

dd2 <- data.frame(measure = c("wheel", "wheel", "length", "width", "wheel"), value = 1:5, model = "a", stringsAsFactors = F)
dd2[duplicated(dd2$measure), "measure"] <- paste0(dd2[duplicated(dd2$measure), "measure"] , 1:3)
Error in `[<-.data.frame`(`*tmp*`, duplicated(dd2$measure), "measure",  : 
  replacement has 3 rows, data has 2

Anyway, how could I modify my variable measure dynamically so all the words are unique?


Solution

  • Can you use dplyr::mutate as below:

    dd <- dd %>%
      group_by(model, measure) %>%
      mutate(measure2 = paste0(measure, ifelse(row_number() > 1, row_number() - 1, ""))) %>%
      ungroup() %>%
      mutate(measure = measure2) %>%
      select(measure, model, value)
    dd
    # A tibble: 6 x 3
      measure model value
      <chr>   <chr> <int>
    1 wheel   a         1
    2 wheel1  a         2
    3 length  a         3
    4 width   a         4
    5 wheel2  a         5
    6 width1  a         6