Search code examples
javajavafxluceneid3

Trying to get more matches with lucene


I'm using Java and lucene to match each song of a list I receive from a service, with local files. What I'm currently struggling with, is finding a query that will get me the greatest amount of matches per song possible. If I could get at least one matching file per song, it would be great.

This is what I have atm:

public List<String> getMatchesForSong(String artist, String title, String album) throws ParseException, IOException {
    StandardAnalyzer analyzer = new StandardAnalyzer();

    String defaultQuery = "(title: \"%s\"~2) AND ((artist: \"%s\") OR (album: \"%s\"))";
    String searchQuery = String.format(defaultQuery, title, artist, album);

    Query query = new QueryParser("title", analyzer).parse(searchQuery);

    if (indexWriter == null) {
        indexWriter = createIndexWriter(indexDir);
        indexSearcher = createIndexSearcher(indexWriter);
    }

    TopDocs topDocs = indexSearcher.search(query, 20);

    if (topDocs.totalHits > 0) {
        return parseScoreDocsList(topDocs.scoreDocs);
    }

    return null;
}

This works very well when there are no inconsistencies, even for non-English characters. But it will not return me a single match, for example, if I receive a song with the title "The Sun Was In My Eyes: Part One", but my corresponding file has the title "The Sun Was In My Eyes: Part 1", or if I receive it like "Pt. 1".

I don't get matches either, when the titles have more words than the corresponding files, like "The End of all Times (Martyrs Fire)" opposed to "The End of all Times". Could happen for albums names too.

So, what I'd like to know is what improvements should I make in my code, in order to get more matches.


Solution

  • So I eventually found out that using a PhraseQuery for the title or album, isn't the best approach, since that would cause lucene to search for an exact mach of such phrase.

    What I ended up doing was making a TermQuery for each of the words, of both the title and album, and join everything in a BooleanQuery.

    private Query parseQueryForSong(String artist, String title, String album) throws ParseException {
        String[] artistArr = artist.split(" ");
        String[] titleArr = sanitizePhrase(title).split(" ");
        String[] albumArr = sanitizePhrase(album).split(" ");
    
        BooleanQuery.Builder mainQueryBuilder = new BooleanQuery.Builder();
        BooleanQuery.Builder albumQueryBuilder = new BooleanQuery.Builder();
        PhraseQuery artistQuery = new PhraseQuery("artist", artistArr);
    
        for (String titleWord : titleArr) {
            if (!titleWord.isEmpty()) {
                mainQueryBuilder.add(new TermQuery(new Term("title", titleWord)), BooleanClause.Occur.SHOULD);
            }
        }
    
        for (String albumWord : albumArr) {
            if (!albumWord.isEmpty()) {
                albumQueryBuilder.add(new TermQuery(new Term("album", albumWord)), BooleanClause.Occur.SHOULD);
            }
        }
    
        mainQueryBuilder.add(artistQuery, BooleanClause.Occur.MUST);
        mainQueryBuilder.add(albumQueryBuilder.build(), BooleanClause.Occur.MUST);
    
        StandardAnalyzer analyzer = new StandardAnalyzer();
        Query mainQuery = new QueryParser("title", analyzer).parse(mainQueryBuilder.build().toString());
    
        return mainQuery;
    }