Search code examples
text-processing

How determine if files/folders have a ascending order numbering pattern?


I am trying to determine if all the files/folders present in a directory have a ascending order numbering pattern at same place throughout in their name

If the numbers were always present at a constant place in every case , this would have been super easy

ls $HOME/dir

1. Some String
2. Some String- Part 4
3. Some String- Part 5

Here i would just simply use something like ls $HOME/dir | sort -V | grep -Eo '^[0-9]'

The command will output 1 2 3 and The files/folders have ascending order numbering pattern is a easy conclusion

Now there are 2 problems here :

  1. Its not necessary that these numbers would always be at start like above
  2. There could be sometimes random numbers in between

==========================================

ls $HOME/dir

Lecture 1 - Some String
Lecture 2 - Some String - Part 4
Lecture 3 - Some String - Part 5

Expected Output - 1 2 3

I main thing is that i need grep to only output numbers if they are present in ascending order at the very same position in filenames throughout

==========================================

ls $HOME/dir

1. Some String
Some String - Part 2
Some String - Part 3

For something like this , grep shouldn't output anything at all because even though it has ascending numbers in name, they are not present at same place throughout

==========================================

PS / The 'Some String' part in all my example would be different for each file/folders. Only the position of the ascending numbers being constant (If any ) is to be considered

One More final example

ls $HOME/dir

CB) Lecture 1 xyz
CB) Lecture 2 abc-part 8
CB) Lecture 3 pqr-part 9

Expected Output - 1 2 3


Solution

  • Here is one solution using AWK:

    printnumbers.awk

    BEGIN {
        numberRegex = "[0-9]+([^A-Za-z0-9]|$)"
    }
    
    NR == 1 {
        numberPos = match($0, numberRegex)
    }
    
    match($0, numberRegex) == numberPos {
        matchedString = substr($0, RSTART, RLENGTH)
        match(matchedString, "[0-9]+")
        result = result substr(matchedString, RSTART, RLENGTH) "\n"
        next
    }
    
    {
        result = ""
        exit 1
    }
    
    END {
        printf("%s", result)
    }
    

    Then run

    $ ls $HOME/dir | sort -V | awk -f printnumbers.awk
    

    Edit 2021-05-14

    A second approach is to split each line into fields with non-digits as separators. Then each field is either a number or an empty string. For each line we check the fields to see if a sequence of consecutive numbers starting from one is formed.

    Here is the logic:

    BEGIN {
        FS = "[^0-9]+"
    }
    
    {
        for (i = 1; i <= NF; i++) {
            numbersConsecutive[i] = ($i == NR) && ((NR == 1) || numbersConsecutive[i])
        }
        if (NF > numbersConsecutiveLen) {
            numbersConsecutiveLen = NF
        }
    }
    
    END {
        consecutiveNumbersFound = 0
        for (i = 1; i <= numbersConsecutiveLen; i++) {
            if (numbersConsecutive[i]) {
                consecutiveNumbersFound = 1
            }
        }
        if (consecutiveNumbersFound) {
            for (i = 1; i <= NR; i++) {
                print i
            }
        }
    }