I have a dataframe (tibble) with over 3000 rows and over 100 cols. I want to count every 5th columns (e.g., StoZ.x, StoZ.y, StoZ.x.x, StoZ.y.y like that) if any data presence or not. If data is present on 5th column, then total count would be added by creating a new column.
A tibble: 1,500 × 23
chr start end.x end.y StoZ.x Tier.x Gene.x cohort.x end.x.x StoZ.y Tier.y Gene.y cohort.y
> > <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr>
1 chr1 2220001 2230000 2230000 -2.68 T2 SKI cr19naBile… NA NA NA NA NA
2 chr1 2230001 2240000 NA NA NA NA NA NA NA NA NA NA
3 chr1 2240001 2250000 NA NA NA NA NA NA NA NA NA NA
4 chr1 2250001 2260000 2260000 -2.24 T2 SKI cr19naBile… 2260000 -2.90 T2 SKI ji15pnH…
5 chr1 2270001 2280000 NA NA NA NA NA NA NA NA NA NA
6 chr1 2280001 2290000 NA NA NA NA NA NA NA NA NA NA
7 chr1 2290001 2300000 NA NA NA NA NA NA NA NA NA NA
8 chr1 2550001 2560000 NA NA NA NA NA 2560000 -2.13 T1 TNFRS… ji15pnH…
The expected table would be like that, a addition column Count will refer the presence of gene in particular window.
A tibble: 1,500 × 23
chr start end.x end.y StoZ.x Tier.x Gene.x cohort.x end.x.x StoZ.y Tier.y Gene.y cohort.y Count
> > > <chr> <dbl> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <chr> <num>
1 chr1 2220001 2230000 2230000 -2.68 T2 SKI cr19naBile… NA NA NA NA NA 1
2 chr1 2230001 2240000 NA NA NA NA NA NA NA NA NA NA 0
3 chr1 2240001 2250000 NA NA NA NA NA NA NA NA NA NA 0
4 chr1 2250001 2260000 2260000 -2.24 T2 SKI cr19naBile… 2260000 -2.90 T2 SKI ji15pnH…. 2
5 chr1 2270001 2280000 NA NA NA NA NA NA NA NA NA NA 0
6 chr1 2280001 2290000 NA NA NA NA NA NA NA NA NA NA 0
7 chr1 2290001 2300000 NA NA NA NA NA NA NA NA NA NA 0
8 chr1 2550001 2560000 NA NA NA NA NA 2560000 -2.13 T1 TNFRS… ji15pnH… 1
You can use pick
from the dplyr
package to select the columns whose name starts with "StoZ"
and sum the number of non-NA values by row.
library(dplyr) # packageVersion("dplyr") >= 1.1.0
ndf |>
mutate(count = rowSums(!is.na(pick(starts_with("StoZ")))))
#> chr start end.x end.y StoZ.x Tier.x Gene.x cohort.x end.x.x StoZ.y Tier.y Gene.y cohort.y count
#> 1 chr1 2220001 2230000 2230000 -2.68 T2 SKI cr19naBile… NA NA <NA> <NA> <NA> 1
#> 2 chr1 2230001 2240000 NA NA <NA> <NA> <NA> NA NA <NA> <NA> <NA> 0
#> 3 chr1 2240001 2250000 NA NA <NA> <NA> <NA> NA NA <NA> <NA> <NA> 0
#> 4 chr1 2250001 2260000 2260000 -2.24 T2 SKI cr19naBile… 2260000 -2.90 T2 SKI ji15pnH… 2
#> 5 chr1 2270001 2280000 NA NA <NA> <NA> <NA> NA NA <NA> <NA> <NA> 0
#> 6 chr1 2280001 2290000 NA NA <NA> <NA> <NA> NA NA <NA> <NA> <NA> 0
#> 7 chr1 2290001 2300000 NA NA <NA> <NA> <NA> NA NA <NA> <NA> <NA> 0
#> 8 chr1 2550001 2560000 NA NA <NA> <NA> <NA> 2560000 -2.13 T1 TNFRS… ji15pnH… 1