I am wondering if there is an easy way to check if a string is a subsequence of another string in bash, actually a subsequence with an extra rule. I will explain.
Some subsequences of "apple" are "aple", "al", "pp" and "ale". The subsequences with an extra rule, I want to get are those that start and end with the same letter as the string so only "aple" and "ale" fit my desire.
I have made the following program:
#!/bin/bash
while read line
do
search=$(echo "$line" | tr -s 'A-Za-z' | sed 's/./\.\*&/g;s/^\.\*//' )
expr match "$1" "$search" >/dev/null && echo "$line"
done
It is executed as followed:
./program.sh greogdgedlqfe < words.txt
This program works, but is very slow.
It takes every line of the file, modify it to regex expression and then check if they match and then print the original line. So example:
one of the lines has the word google
$search becomes g.*o.*g.*l.*e (repeated letters become squeezed, extra rule )
then we check that expression with the given parameter and if it matches, we print the line: google
This works fine, however when the file words.txt gets too big, this program becomes too slow. How can I speed up my program, possibly by faster matching subsequences.
Edit after possible solution of Kamilcuk
That solution returns quick,quiff,quin,qwerty for the string "qwertyuihgfcvbnhjk" and only quick should be returned, so it is almost correct, but not quite yet.
Try it like so:
grep -x "$(<<<"$1" tr -s 'A-Za-z' | sed 's/./&*/g;s/\*$//;s/\*//1')" words.txt
Tested against:
set -- apple
cat >words.txt <<EOF
aple
al
pp
ale
fdafda
apppppppple
apple
google
EOF
outputs:
aple
ale
apppppppple
apple
And for set -- greogdgedlqfe
it outputs just google
.
If I understand you correctly, a "subsequent" of apple
is everything that mathes ap*l*e
.