I want to make spread data based on duplicate data. I want to accumulate column based on overlapping rows. For example, X2 column is repeated 5 times.
It looks similar to using spread function in tidyr package, but it's a little different. Could you give some idea for me?
INPUT:
X1 X2 X3
1 A 1 10
2 B 2 11
3 C 3 13
4 A 4 12
5 F 5 16
6 D 1 17
.
.
.
515 E 5 18
OUTPUT:
X2 X1.1 X3.1 X1.2 X3.2 ... 102 steps
1 1 A 10 D 17
2 2 B 11
3 3 C 13
4 4 A 12
5 5 F 16
With dplyr
and tidyr
we can group by X2
and build tibbles that we'll unnest
into the desired format.
I modified your data slightly to have a full example :
library(tidyverse)
df1 %>%
group_by(X2) %>%
summarize_all(~as.list(.) %>%
setNames(seq_along(.)) %>%
as_tibble %>%
list) %>%
unnest(.sep = ".")
# # A tibble: 3 x 5
# X2 X1.1 X1.2 X3.1 X3.2
# <int> <chr> <chr> <int> <int>
# 1 1 A A 10 12
# 2 2 B F 11 16
# 3 3 C D 13 17
And a legit tidy way :
df1 %>%
mutate(i = cumsum(X2 == 1)) %>%
gather(key, value, X1,X3) %>%
unite(key,key,i, sep=".") %>%
spread(key, value)
# X2 X1.1 X1.2 X3.1 X3.2
# 1 1 A A 10 12
# 2 2 B F 11 16
# 3 3 C D 13 17
data
df1 <- read.table(header=TRUE,stringsAsFactors=FALSE,text="X1 X2 X3
1 A 1 10
2 B 2 11
3 C 3 13
4 A 1 12
5 F 2 16
6 D 3 17")