How to insert a new row every time there is a break in the y
column numbering, while filling the other columns with NA
?
Edit: Iroha's suggestion based on tidyr::complete()
works fine, as do the others. However, when I apply it to millions of rows, it runs for hours.
It seems like a data.table
approach would be faster, but I never use it and I can't seem to find a solution.
Any help/advice would be greatly appreciated!
Initial data:
x <- c("a","a","a","a","a","a","a","a","a","b","b","b","b","b","b","b")
y <- c(1,1,2,4,4,4,7,7,11,1,3,3,3,6,6,8)
dat0 <- data.frame(x,y)
>dat0
x y
1 a 1
2 a 1
3 a 2
4 a 4
5 a 4
6 a 4
7 a 7
8 a 7
9 a 11
10 b 1
11 b 3
12 b 3
13 b 3
14 b 6
15 b 6
16 b 8
Desired output:
> dat1
x y
1 a 1
2 a 1
3 a 2
4 <NA> 3
5 a 4
6 a 4
7 a 4
8 <NA> 5
9 <NA> 6
10 a 7
11 a 7
12 <NA> 8
13 <NA> 9
14 <NA> 10
15 a 11
16 b 1
17 <NA> 2
18 b 3
19 b 3
20 b 3
21 <NA> 4
22 <NA> 5
23 b 6
24 b 6
25 <NA> 7
26 b 8
Thanks for help
Here is a data.table approach (in both examples below, be sure to start with setDT(dat0)
):
f <- \(k) k[CJ(y=1:max(k$y)), on="y"]
dat0[, f(.SD), x, .SDcols=c("y", "x")][, c(2,3)]
Output:
y x
<int> <char>
1: 1 a
2: 1 a
3: 2 a
4: 3 <NA>
5: 4 a
6: 4 a
7: 4 a
8: 5 <NA>
9: 6 <NA>
10: 7 a
11: 7 a
12: 8 <NA>
13: 9 <NA>
14: 10 <NA>
15: 11 a
16: 1 b
17: 2 <NA>
18: 3 b
19: 3 b
20: 3 b
21: 4 <NA>
22: 5 <NA>
23: 6 b
24: 6 b
25: 7 <NA>
26: 8 b
y x
Alternatively, you could do this:
dat0[, x2:=x][dat0[, .(y=1:max(y)),x], on=.(x,y), .(x=x2, y)]
Output:
x y
<char> <int>
1: a 1
2: a 1
3: a 2
4: <NA> 3
5: a 4
6: a 4
7: a 4
8: <NA> 5
9: <NA> 6
10: a 7
11: a 7
12: <NA> 8
13: <NA> 9
14: <NA> 10
15: a 11
16: b 1
17: <NA> 2
18: b 3
19: b 3
20: b 3
21: <NA> 4
22: <NA> 5
23: b 6
24: b 6
25: <NA> 7
26: b 8