How can I list files in a folder based on version number?

I have a folder containing .txt files named like the following:

A_COR_001_I
A_COR_001_II
A_COR_002_I
A_COR_002_II
A_COR_003_I
A_COR_003_II
A_COR_003_III
A_COR_004_I
A_COR_004_II
A_COR_004_III
A_COR_004_IV
...

The roman numerals at the end of each string signify the definitive draft of a distinct document, identified by the preceding arabic numbers, like 002. I am trying to extract only the final drafts with a regex pattern using a list.files() function, but the problem is that each document has an unpredictable number of drafts, so I would need a way to group together the drafts of each document and single out the ones with the highest number, so A_COR_004_IV instead of A_COR_004_III or any other. Any ideas on how to proceed? Thanks in advance!

Solution

Base R has an as.roman() function which allows Simple manipulation of... roman numerals.

So split the files into lists by filename based on what appears before the last underscore (i.e. "A_COR_001" to "A_COR_004") then find the element with the max() roman numeral (i.e. max numeric value after the final underscore).

split(files, sub("_[^_]+$", "", files)) |>
    lapply(
        \(l) l[which.max(as.roman(sub(".*_", "", l)))]
    )
# $A_COR_001
# [1] "A_COR_001_II"

# $A_COR_002
# [1] "A_COR_002_II"

# $A_COR_003
# [1] "A_COR_003_III"

# $A_COR_004
# [1] "A_COR_004_IV"

I imagine this will not be a problem here but note that the docs state:

Only numbers between 1 and 3999 have a unique representation as roman numbers, and hence others result in as.roman(NA).

Interestingly, this is actually just structure(NA_integer_, class = "roman").

Incidentally, list.files() will return the results in lexicographic order, which if you have at most 8 versions of all files is the order that you want (until IX). So you can just do lapply(split(files, sub("_[^_]+$", "", files)), tail, 1).

Data

files <- c( "A_COR_001_I", "A_COR_001_II", "A_COR_002_I", "A_COR_002_II", "A_COR_003_I", "A_COR_003_II", "A_COR_003_III", "A_COR_004_I", "A_COR_004_II", "A_COR_004_III", "A_COR_004_IV" )