I am trying to solve an exercise question from R for data science (2E) of chapter 16 (16.5.4 question no. 1) which requires to extract the middle letter of every name of the dataset. So I wrote the code below to find the middle letter if the name has odd number of letters or the middle two letters if the name has even number of letter.
library(tidyverse)
library(babynames)
babynames |>
mutate(
length = str_length(name),
middle = if_else((length / 2) %% 2 != 0,
str_sub(name, ceiling(length / 2), ceiling(length / 2)),
str_sub(name, length / 2, (length / 2)+1)
)
)
Now the code gives me my expected result except when the name has 6 letters. Instead of extracting the middle two letters it shows only the first of the two letters
# A tibble: 1,924,665 × 7
year sex name n prop length middle
<dbl> <chr> <chr> <int> <dbl> <int> <chr>
1 1880 F Mary 7065 0.0724 4 ar
2 1880 F Anna 2604 0.0267 4 nn
3 1880 F Emma 2003 0.0205 4 mm
4 1880 F Elizabeth 1939 0.0199 9 a
5 1880 F Minnie 1746 0.0179 6 n
6 1880 F Margaret 1578 0.0162 8 ga
7 1880 F Ida 1472 0.0151 3 d
8 1880 F Alice 1414 0.0145 5 i
9 1880 F Bertha 1320 0.0135 6 r
10 1880 F Sarah 1288 0.0132 5 r
# … with 1,924,655 more rows
# ℹ Use `print(n = ...)` to see more rows
I don't understand why the code is making an exception for the names with 6 letters. What can be the reason behind this?
Your check for odd/even values is incorrect. Look at
data.frame(length=1:10) |>
mutate(compare=(length / 2) %% 2 != 0)
# length compare
# 1 1 TRUE
# 2 2 TRUE
# 3 3 TRUE
# 4 4 FALSE
# 5 5 TRUE
# 6 6 TRUE
# 7 7 TRUE
# 8 8 FALSE
# 9 9 TRUE
# 10 10 TRUE
Notice that's not triggering correctly for even/odd values. The extra /2
in there is checking that numbers are actually divisible by 4. You should be using
babynames |>
mutate(
length = str_length(name),
middle = if_else(length %% 2 != 0,
str_sub(name, ceiling(length / 2), ceiling(length / 2)),
str_sub(name, length / 2, (length / 2)+1)
)
)
# year sex name n prop length middle
# <dbl> <chr> <chr> <int> <dbl> <int> <chr>
# 1 1880 F Mary 7065 0.0724 4 ar
# 2 1880 F Anna 2604 0.0267 4 nn
# 3 1880 F Emma 2003 0.0205 4 mm
# 4 1880 F Elizabeth 1939 0.0199 9 a
# 5 1880 F Minnie 1746 0.0179 6 nn
# 6 1880 F Margaret 1578 0.0162 8 ga
# ...