Given a phrase like "I am searching for a text" and one text file that contains the list of words.
I have to find the whether each and every combination of the word present in the text file.
For example, I have to search for the occurrence "I", "I am", "I am searching", "I am searching for", "searching for" etc.
I prefer to write this in perl and I needed a optimal solution that runs faster.
Example text file :
I \n
am searching \n
Text \n
searching for \n
searching for a \n
for searching ---> my program should not match this
etc
The code below prints all the sub_phrases that you want to match.
$phrase = 'I am searching for a text';
$\ = "\n";
@words = ();
print "Indices:";
while( $phrase =~ /\b\w+\b/g ) {
push @words, {word => $&, begin => $-[0], end => $+[0]};
}
$num_words = $#words + 1;
print 'there are ', $num_words, ' words';
for( $i=0; $i<$num_words; $i++ ) {
for( $j=$i; $j<$num_words; $j++ ) {
($start,$finish) = ($words[$i]->{begin}, $words[$j]->{end});
$sub_phrase = substr $phrase, $start, $finish-$start;
print "$i-$j: $sub_phrase";
}
}
some explanations:
To complete your exercise, you want to save all the sub_phrase's into an array (instead of 'print' do 'push' into an @permutations). then iterate through your text file, and for each line, try to match against each permutation.