I am trying to determine if all the files/folders present in a directory have a ascending order numbering pattern at same place throughout in their name
If the numbers were always present at a constant place in every case , this would have been super easy
ls $HOME/dir
1. Some String
2. Some String- Part 4
3. Some String- Part 5
Here i would just simply use something like
ls $HOME/dir | sort -V | grep -Eo '^[0-9]'
The command will output 1 2 3 and The files/folders have ascending order numbering pattern is a easy conclusion
Now there are 2 problems here :
==========================================
ls $HOME/dir
Lecture 1 - Some String
Lecture 2 - Some String - Part 4
Lecture 3 - Some String - Part 5
Expected Output - 1 2 3
I main thing is that i need grep to only output numbers if they are present in ascending order at the very same position in filenames throughout
==========================================
ls $HOME/dir
1. Some String
Some String - Part 2
Some String - Part 3
For something like this , grep shouldn't output anything at all because even though it has ascending numbers in name, they are not present at same place throughout
==========================================
PS / The 'Some String' part in all my example would be different for each file/folders. Only the position of the ascending numbers being constant (If any ) is to be considered
One More final example
ls $HOME/dir
CB) Lecture 1 xyz
CB) Lecture 2 abc-part 8
CB) Lecture 3 pqr-part 9
Expected Output - 1 2 3
Here is one solution using AWK:
printnumbers.awk
BEGIN {
numberRegex = "[0-9]+([^A-Za-z0-9]|$)"
}
NR == 1 {
numberPos = match($0, numberRegex)
}
match($0, numberRegex) == numberPos {
matchedString = substr($0, RSTART, RLENGTH)
match(matchedString, "[0-9]+")
result = result substr(matchedString, RSTART, RLENGTH) "\n"
next
}
{
result = ""
exit 1
}
END {
printf("%s", result)
}
Then run
$ ls $HOME/dir | sort -V | awk -f printnumbers.awk
Edit 2021-05-14
A second approach is to split each line into fields with non-digits as separators. Then each field is either a number or an empty string. For each line we check the fields to see if a sequence of consecutive numbers starting from one is formed.
Here is the logic:
BEGIN {
FS = "[^0-9]+"
}
{
for (i = 1; i <= NF; i++) {
numbersConsecutive[i] = ($i == NR) && ((NR == 1) || numbersConsecutive[i])
}
if (NF > numbersConsecutiveLen) {
numbersConsecutiveLen = NF
}
}
END {
consecutiveNumbersFound = 0
for (i = 1; i <= numbersConsecutiveLen; i++) {
if (numbersConsecutive[i]) {
consecutiveNumbersFound = 1
}
}
if (consecutiveNumbersFound) {
for (i = 1; i <= NR; i++) {
print i
}
}
}