I have multiple files which is formatted like:
> Test1.txt
NameNo Team etc
1: AS001-A. 8 773
2: AS002-S. 7 631
3: AS003-G. 8 970
> Test2.txt
NameNo Team etc
1: AB001-A. 2 773
2: AB002-S. 6 631
3: AB003-G. 6 970
> Test2.txt
NameNo Team etc
1: AR001-A. 1 773
2: AR002-S. 1 631
3: AR003-G. 1 970
Here is my code:
files <- list.files(pattern ="*.txt", full.names = TRUE)
files
> files
Test1.txt
Test2.txt
Test3.txt
I then used table(Team)
one by one.
I need to summarize column Team
Desired output:
Item Team
Test1 7,8
Test2 2,6
Test3 1
Thanks for all help.
# get all the textfiles starting with "Test"
list.files(pattern ="^Test.*\\.txt$") |>
# loop through the files, reading them in, adding a column for their filename
map_dfr(~read_csv(.) |> mutate(file = str_remove(.x, ".txt"))) |>
# separate the weird format into columns using regex
separate_wider_regex(`NameNo Team etc`, patterns = c("\\d+:\\s+", NameNo = "\\S+", "\\s+", Team = "\\d+", "\\s+", etc = "\\d+")) |>
# get the unique teams for each file, sorted
summarise(Team = paste0(sort(unique(`Team`)), collapse = ","), .by = file)
Output:
# A tibble: 3 × 2
file Team
<chr> <chr>
1 Test1 7,8
2 Test2 2,6
3 Test3 1
If the row numbers aren't actually part of the files, you can do this:
f = list.files(pattern ="^Test.*\\.txt$")
tibble(Item = str_remove(f, "\\.txt"),
Team = map_chr(f, ~read.table(., header = TRUE)$Team |> unique() |> sort() |> paste0(collapse = ",")))