Search code examples
pythonmusicbrainz

Filter results from browse_release_groups by artist_id to get discography, python


I'm trying to retrieve discographies for various artists. Wikipedia and the manual web interface for MusicBrainz.org seem to agree on what albums make this up, for the artists I've checked. My first thought was to attempt to screen-scrape either of these resources, but that looks like hard work to do it properly.

Direct queries of the musicbrainz data seemed to offer a quicker way to get clean data. I would ideally construct a request like this ...

data = get_release_groups(artist=mbid,
                          primary_type='Album',
                          status='Official',
                          includes=['first_release_date',
                                    'title',
                                    'secondary_type_list'])

I chose to use the python wrapper musicbrainsngs, as I am fairly experienced with python. It gave me a choice of three methods, get_, search_ and browse_. Get_ will not return sufficient records. Browse_ appeared to be what I wanted, so I tried that first, especially as search_ was documented around looking for text in the python examples, rather than the mb_id, which I already had.

When I did a browse_release_groups(artist=artist_id,,,), I got a list of release groups, each containing the data I wanted, which was album title, type and year. However, I also got a large number of other release groups that don't appear on their manual web results for (for example The Rolling Stones) https://musicbrainz.org/artist/b071f9fa-14b0-4217-8e97-eb41da73f598

There didn't appear to be any way to filter in the query for status='official', or to include the status as part of the results so I could manually filter.

In response to this question, Wieland has suggested I use the search_ query. I have tested search_release_groups(arid=mbid, status='official', primarytype='Album', strict=True, limit=...) and this returns many fewer release groups. As far as studio albums are concerned, it matches 1:1. There are still a few minor discrepancies in the compilations, which I can live with. However, this query did not return the first-release-date, and so far, it has been resistant to my attempts to find how to include it. I notice in the server search code linked to that every query starts off manipulating rgm.first_release_date_year etc, but it's not clear how/when this gets returned from a query.

It's just occurred to me that I can use both a browse_ and a search_ , as together they give me all the information. So I have a work around, but it feels rather agricultural.

TL;DR I want release groups (titles, dates, types, status) by artist ID. If I browse, I get dates, but can't include or filter by status. If I search, I can filter by status, but don't get dates. How can I get both in one query?


Solution

  • I'm not entirely sure what your question is, but the find_by_artist method of release groups (source here) is what's doing the filtering of release groups for the artist pages, in particular:

         # Show only RGs with official releases by default, plus all-status-less ones so people fix the status
        unless ($show_all) {
        push @$conditions, "(EXISTS (SELECT 1 FROM release where release.release_group = rg.id AND release.status = '1') OR
                            NOT EXISTS (SELECT 1 FROM release where release.release_group = rg.id AND release.status IS NOT NULL))";
        }
    

    Unfortunately, I think it's not possible to express that condition in a normal web service call. You can, however, use the search web service to filter for release groups by the rolling stones that contain at least one "official" release: http://musicbrainz.org/ws/2/release-group/?query=arid:b071f9fa-14b0-4217-8e97-eb41da73f598%20AND%20status:official&offset=0. In python-musicbrainzngs, the call for this is

    search_release_groups(arid="b071f9fa-14b0-4217-8e97-eb41da73f598", status="official", strict=True)
    

    Unfortunately, the search results don't include the first-release-date field. There's an open ticket about it, but it's not going to be fixed in the near future.