Search code examples
androidfirebasefirebase-realtime-databaselevenshtein-distancefuzzy-search

Firebase advanced fuzzy search with levenshtein ordering and word by word


Search is one of the most important parts of my current application. It needs to feel like a fast, accurate, global search. The app is based on Firebase and I find Firebase's equalTo() / startAt() combination to be fairly lacking in this aspect.

Current situation

What I want to achieve:

  • The results ordered by relevance
  • To match word by word (so öö pime should yield above result)
  • Search in multiple properties (so põhjala pime should yield above result)
  • Fuzzy search (levenshtein?) - pojala should match Põhjala

What I already did

Instead of using equalTo(), I download an entire branch (for example beers), and loop through it, performing my own contains(). This works and is reasonably fast. However, it lacks all the things I mentioned. Here's the current code.

           final ArrayList<SearchBeerAdapter.BeerBrewery> searchResults = new ArrayList<>();
            FirebaseUtil.getBeersRef().orderByChild("name").addValueEventListener(new ValueEventListener() {
                @Override
                public void onDataChange(final DataSnapshot ogDS) {
                    int childCounter = 0;
                    for (DataSnapshot ds: ogDS.getChildren()){
                        childCounter++;
                        if (resultCounter[0] < 5) {
                            final Beer beer = ds.getValue(Beer.class);
                            final String beerId = ds.getKey();

                            // Limit to five results and remove listener
                            final int finalChildCounter = childCounter;
                            FirebaseUtil.getBreweriesRef().child(beer.getBreweryId()).addListenerForSingleValueEvent(new ValueEventListener() {
                                @Override
                                public void onDataChange(DataSnapshot dataSnapshot) {
                                    Brewery brewery = dataSnapshot.getValue(Brewery.class);
                                    if (beer.getFullName().toLowerCase().contains(query.toLowerCase()) || brewery.getName().toLowerCase().contains(query.toLowerCase())) {
                                        resultCounter[0] = resultCounter[0] + 1;
                                        if (resultCounter[0] < 5) {
                                            searchResults.add(new SearchBeerAdapter.BeerBrewery(beer, brewery, beerId));
                                        }
                                    }

                                    // Initialize the adapter once we've hit the end of our results
                                    if (finalChildCounter == ogDS.getChildrenCount()){
                                        SearchBeerAdapter sa = new SearchBeerAdapter(searchResults,glide);
                                        rv.setAdapter(sa);
                                    }

                                }

                                @Override
                                public void onCancelled(DatabaseError databaseError) {

                                }
                            });
                        }

                    }
                }

                @Override
                public void onCancelled(DatabaseError databaseError) {

                }
            });

I'm guessing what needs to be done is each match needs to get a score in searchResults and after we're done looping the ArrayList needs to be sorted by this score. My main question would boil down to how to get the best score considering above criteria. Any libraries or code samples would be extremely welcome.

Thanks in advance.


Solution

  • After some failed attempts at doing my own scoring and Googling a lot I found FuzzyWuzzy. This rather nice library uses levenshtein, but has a extractTop() and extractAll() functionality. It is actually a partial fuzzy search and perfect for this case.

    The library just searches in Strings, but you can work around that by creating a string only array and a reference array.