I have a data frame that looks like this. I want inside the dplyr pipeline to replace only the 7 first rows of the threshold column with the values that come from the manufacturer column.
library(tidyverse)
mpg %>%
arrange(cty) %>%
mutate(threshold=NA) %>%
select(manufacturer,cty, threshold)
#> # A tibble: 234 × 3
#> manufacturer cty threshold
#> <chr> <int> <lgl>
#> 1 dodge 9 NA
#> 2 dodge 9 NA
#> 3 dodge 9 NA
#> 4 dodge 9 NA
#> 5 jeep 9 NA
#> 6 chevrolet 11 NA
#> 7 chevrolet 11 NA
#> 8 chevrolet 11 NA
#> 9 dodge 11 NA
#> 10 dodge 11 NA
#> # … with 224 more rows
Created on 2022-08-31 with reprex v2.0.2
I want my data to look like this
#> manufacturer cty threshold
#> <chr> <int> <lgl>
#> 1 dodge 9 dodge
#> 2 dodge 9 dodge
#> 3 dodge 9 dodge
#> 4 dodge 9 dodge
#> 5 jeep 9 jeep
#> 6 chevrolet 11 chevrolet
#> 7 chevrolet 11 chevrolet
#> 8 chevrolet 11 NA
#> 9 dodge 11 NA
#> 10 dodge 11 NA
any help and guidance as always are highly appreciated
USe case_when
to create a logical condition with row_number()
for replacement. In addition, there is no need to create a blank column i.e. the NAs
can be filled by default in case_when
library(dplyr)
library(ggplot2)
mpg %>%
arrange(cty) %>%
select(manufacturer, cty) %>%
mutate(threshold = case_when(row_number() < 7 ~manufacturer))
-output
# A tibble: 234 × 3
manufacturer cty threshold
<chr> <int> <chr>
1 dodge 9 dodge
2 dodge 9 dodge
3 dodge 9 dodge
4 dodge 9 dodge
5 jeep 9 jeep
6 chevrolet 11 chevrolet
7 chevrolet 11 <NA>
8 chevrolet 11 <NA>
9 dodge 11 <NA>
10 dodge 11 <NA>
# … with 224 more rows