Search code examples
searchfreebasestring-search

what algorithm does freebase use to match by name?


I'm trying to build a local version of the freebase search api using their quad dumps. I'm wondering what algorithm they use to match names? As an example, if you go to freebase.com and type in "Hiking" you get

  • "Apo Hiking Society"
  • "Hiking"
  • "Hiking Georgia"
  • "Hiking Virginia's national forests"
  • "Hiking trail"

Solution

  • Wow, a lot of guesses! I hope I don't muddy the waters too much by not guessing too.

    The auto-complete box is basically powered by Freebase Suggest which is powered, in turn, by the Freebase Search service. Strings which are indexed by the search service for matching include: 1) the name, 2) all aliases in the given language, 3) link anchor text from the associated Wikipedia articles and 4) identifiers (called keys by Freebase), which includes things like Wikipedia article titles (and redirects).

    How the various things are weighted/boosted hasn't been disclosed, but you can get a feel for things by playing with it for while. As you can see from the API, there's also the ability to do filtering/weighting by types and other criteria and this can come into play depending on the context. For example, if you're adding a record label to an album, topics which are typed as record labels will get a boost relative to things which aren't (but you can still get to things of other types to allow for the use case where your target topic doesn't hasn't had the appropriate type applied yet).

    So that gives you a little insight into how their service works, but why not build a search service that does what you need since you're starting from scratch anyway?

    BTW, pre-Google the Metaweb search implementation was based on top of Lucene, so you could definitely do worse than using that as your starting point. You can read some of the details in the mailing list archive