Search code examples
javafinite-automatanfaautomatonfinite-state-automaton

Automaton for prefix matching


Using an open source Java automaton library, eg: org.apache.lucene.util.automaton or dk.brics.automaton, how can I build an automaton for prefix matching?

eg: an automaton created from the set of strings ["lucene", "lucid"], that will match when given "luc", or "luce", but not match when given "lucy" or "lucid dream".


Solution

  • Prefix matching is possible using org.apache.lucene.util.automaton by setting all states to accept, eg:

        String[] strings = new String[]{"lucene", "lucid dream"};
        final List<BytesRef> terms = new ArrayList<>();
        for(String s : strings) {
            terms.add(new BytesRef(s));
        }
        Collections.sort(terms);
        final Automaton a = DaciukMihovAutomatonBuilder.build(terms);
    
        for (int i = 0; i < a.getNumStates(); i++) {
            a.setAccept(i, true);
        }