I have a data like this and for male and female column, I just need the first row (highlited ones). Could you please how can I read just these numbers and exclude rests.
df <- structure(list(
col1 = c("First", "Frequency\nPercent", "CA", "TX"),
col2 = c("Sex_3585", "Male", "298026\n5\n9", "45678\n15\n89"),
col3 = c("", "Female", "57039\n10\n25", "64290\n100\n258")
),
class = "data.frame",
row.names = c(NA,-4L))
col1 col2 col3
1 First Sex_3585
2 Frequency\nPercent Male Female
3 CA 298026\n5\n9 57039\n10\n25
4 TX 45678\n15\n89 64290\n100\n258
First, I create a simple example of your data.
df <- structure(list(
col1 = c("First", "Frequency\nPercent", "CA", "TX"),
col2 = c("Sex_3585", "Male", "298026\n5\n9", "45678\n15\n89"),
col3 = c("", "Female", "57039\n10\n25", "64290\n100\n258")
),
class = "data.frame",
row.names = c(NA,-4L))
col1 col2 col3
1 First Sex_3585
2 Frequency\nPercent Male Female
3 CA 298026\n5\n9 57039\n10\n25
4 TX 45678\n15\n89 64290\n100\n258
Second, after reading in the file with read.csv
, one option is to separate the rows that have carriage returns (i.e., \n
). Then, we can group by the first column and keep only the first row for each group.
library(tidyverse)
df %>%
separate_rows(everything(), sep = "\n") %>%
group_by(col1) %>%
filter(row_number()==1)
Output
col1 col2 col3
<chr> <chr> <chr>
1 First Sex_3585 ""
2 Frequency Male "Female"
3 Percent Male "Female"
4 CA 298026 "57039"
5 TX 45678 "64290"