In Tidyverse there are limitations concerning the row number resulting from some data processing. Most prominent, mutate
expects that the row number equals to the original data set. For example, if we want density values from a variable x we could do:
library(magrittr)
df %>%
dplyr::mutate(dx= density(x)$x,
dy= density(x)$y)
This results in an error saying something like "Caused by error:! dx must be size 100 or 1, not 512."
.
But in many situations the number of rows changes during data processing! Is there any elegant way to incorporate this into the tidyverse
coding?
All I can come up with so far is using {}
where row number changes. See following example where I make interpolation for x on y (which also changes row number):
library(magrittr)
df %>%
# Some data processing where row number stays the same
dplyr::mutate(x2= x*x,
id= 1:dplyr::n()) %>%
# Row number changes! So I use code inside {}
{time_interpolate_for <- seq(min(.$x), max(.$x), 1)
data.frame(x= time_interpolate_for,
y= approx(.$x, .$y, xout= time_interpolate_for)$y)
} %>%
# Going on with the new data and processing it so that row number remains the same
dplyr::mutate(xy_diff= x - y)
Is there a better way to do this?
Data used:
# Generate data
set.seed(1)
x <- sample(1:999, 100); y <- .5*x + rnorm(100)
df <- data.frame(x, y)
You can use summarise
or reframe
(now the recommended method) for such a task. But see the note:
set.seed(1)
x <- sample(1:999, 100); y <- .5*x + rnorm(100)
df <- data.frame(x, y)
library(magrittr)
df %>%
# Some data processing where row number stays the same
dplyr::mutate(x2= x*x, id= 1:dplyr::n()) %>%
dplyr::reframe(
x.0 = seq(min(x), max(x), 1),
y.0 = approx(x, y, xout= x.0)$y) %>%
# Going on with the new data and processing it so that row number remains the same
dplyr::mutate(xy_diff= x.0 - y.0)
summarise
also work but since 1.1.0 there is a deprecation warning, so pay attention of it:...
dplyr::summarise(
x.0 = seq(min(x), max(x), 1),
y.0 = approx(x, y, xout= x.0)$y) %>%
...
Warning: Returning more (or less) than 1 row per
summarise()
group was deprecated in dplyr 1.1.0. ℹ Please usereframe()
instead. ℹ When switching fromsummarise()
toreframe()
, remember thatreframe()
always returns an ungrouped data frame and adjust accordingly.
summarise
to x.0 and y.0 because dplyr verbs will see the new defined x instead of the x of the previous step (args are recursive).|>
instead of magrittr %>%