I'm scraping cars information from a website and I'm getting not constant and not so clean data from it. I'm trying to clean and arrange this data into a data frame.
For example:
dd <- data.frame(measure = c("wheel", "wheel", "length", "width", "wheel", "width"), value = 1:6, model = "a", stringsAsFactors = F)
dd
measure value model
1 wheel 1 a
2 wheel 2 a
3 length 3 a
4 width 4 a
5 wheel 5 a
6 width 6 a
It this example, I have 3 values of wheel
and 2 of width
. In my real data, it's not always the same thing that is repeated, it may or may not have repetition and it could be repeated more than once.
I need to reshape this table to have one line per model
, however I don't want to aggregate the value
that have a common measure
. Precisely, I would want the table to become:
model length wheel wheel1 wheel2 width width1
1 a 3 1 2 5 4 6
This was obtained using dcast
on manually modified data:
library(reshape2)
res <- data.frame(measure = c("wheel", "wheel1", "length", "width", "wheel2", "width1"), value = 1:6, model = "a", stringsAsFactors = F)
dcast(res, model ~ measure)
I need either a way to modify dcast
so it doesn't aggregate the measure
or automatically modify dd
so it becomes res
.
I've tried something ugly and not exactly what I needed:
dd[duplicated(dd$measure), "measure"] <- paste0(dd[duplicated(dd$measure), "measure"] , 1:3)
dd
measure value model
1 wheel 1 a
2 wheel1 2 a
3 length 3 a
4 width 4 a
5 wheel2 5 a
6 width3 6 a
This code isn't working because width
get the index 3
and not 2
. Also, this would not adjust to another table like:
dd2 <- data.frame(measure = c("wheel", "wheel", "length", "width", "wheel"), value = 1:5, model = "a", stringsAsFactors = F)
dd2[duplicated(dd2$measure), "measure"] <- paste0(dd2[duplicated(dd2$measure), "measure"] , 1:3)
Error in `[<-.data.frame`(`*tmp*`, duplicated(dd2$measure), "measure", :
replacement has 3 rows, data has 2
Anyway, how could I modify my variable measure
dynamically so all the words are unique?
Can you use dplyr::mutate
as below:
dd <- dd %>%
group_by(model, measure) %>%
mutate(measure2 = paste0(measure, ifelse(row_number() > 1, row_number() - 1, ""))) %>%
ungroup() %>%
mutate(measure = measure2) %>%
select(measure, model, value)
dd
# A tibble: 6 x 3
measure model value
<chr> <chr> <int>
1 wheel a 1
2 wheel1 a 2
3 length a 3
4 width a 4
5 wheel2 a 5
6 width1 a 6