I have a typical pattern searching problem where I need to identify where multiple patterns are appearing within an array and single them out.
ex: ['horse', 'camel', 'horse', 'camel', 'tiger', 'horse', 'camel', 'horse', 'camel']
function should return
['horse', 'camel'],
['horse', 'camel', 'horse'],
['camel', 'horse', 'camel'],
['horse', 'camel', 'horse', 'camel']
i.e. finding patterns that are repeating within an array which can become a sub-array,
Or the other way of defining is -> Find all the sub-arrays which are occurring more than 1 times in main array.
i.e. resulting arrays should have length > 1
->
[1, 2, 3, 1, 2, 1, 4, 5]
=> [1,2,3]
and [1,4,5]
both are sub-arrays but [1,2,3]
is recurring/repeating sub-array NOT [1,4,5]
Looking for a suitable efficient algorithm instead of brute-force looping solutions.
This probably isn't what you want but I don't know what you have tried yet so maybe it could be useful. Here's my direct approach which probably falls under your "brute-force looping solutions" but I figured give it a try since nobody has posted full answer.
In java:
// use this to not add duplicates to list
static boolean contains (List<String[]> patterns, String[] pattern){
for(String[] s: patterns)
if (Arrays.equals(pattern,s)) return true;
return false;
}
/**
*
* @param str String array containing all elements in your set
* @param start index of subarray
* @param end index of subarray
* @return if subarray is a recurring pattern
*/
static boolean search (String[] str,int start,int end) {
// length of pattern
int len = end - start + 1;
// how many times you want pattern to
// appear in text
int n = 1;
// increment m if pattern is matched
int m = 0;
// shift pattern down the array
for (int i = end+1; i <= str.length - len; i++) {
int j;
for (j = 0; j < len; j++) {
if (!str[i + j].equals(str[start + j]))
break;
}
// if pattern is matched at [i to i+len]
if (j == len) {
m++;
if (m == n) return true;
}
}
return false;
}
/**
*
* @param str String array containing all elements in your set
* @return a list of subsets of input set which are a recurring pattern
*/
static List<String[]> g (String[] str) {
// put patterns in here
List<String[]> patterns = new ArrayList<>();
// iterate through all possible subarrays in str
for(int i = 0; i < str.length-1; i++){
for(int j = i + 1; j < str.length; j++){
// if a pattern is found
if (search(str,i,j)) {
int len = j-i+1;
String[] subarray = new String[len];
System.arraycopy(str,i,subarray,0,len);
if (!contains(patterns,subarray))
patterns.add(subarray);
}
}
}
return patterns;
}
public static void main(String[] args) {
String[] str = {"horse", "camel", "horse", "camel", "tiger",
"horse", "camel", "horse", "camel"};
// print out
List<String[]> patterns = g(str);
for (String[] s: patterns)
System.out.println(Arrays.toString(s));
}
Output:
[horse, camel]
[horse, camel, horse]
[horse, camel, horse, camel]
[camel, horse]
[camel, horse, camel]
As mentioned in a comment i posted:
"would [camel, horse]
be included in the output?"
The output I have goes with this as there are 2 instances of [camel, horse]
at indices [1-2]
and [6-7]
. But maybe I am completely misunderstanding your question and I'm not understanding the constraints.
As for optimizing, the search(...)
method for example is just a simple substring search there are some more optimized ways of doing this e.g. Knuth–Morris–Pratt. Sorry if this was exactly what you didn't want but maybe there's some use