Search code examples
pythonapiamazon-product-apicategorization

Categorize book authors as fiction vs non-fiction


For my own personal purposes, I have about ~300 authors (full name) of various books. I want to partition this list into "fiction authors" and "non-fiction authors". If an author writes both, then the majority gets the vote.

I looked at Amazon Product Search API: I can search by author (in Python), but there is no way to find the book category (fiction vs rest):

>>> node = api.item_search('Books', Author='Richard Dawkins')
>>> for book in node.Items.Item:
...     print book.ItemAttributes.Title

What are my options? I prefer to do this in Python.


Solution

  • Well, you can try another service - Google Book Search API. To use Python you can have a look at gdata-python-api. In its protocol, in result feed there is a node <dc:subject> - probably that's what you need:

    <?xml version="1.0" encoding="UTF-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom"
          xmlns:openSearch="http://a9.com/-/spec/opensearchrss/1.0/"
          xmlns:gbs="http://schemas.google.com/books/2008" 
          xmlns:dc="http://purl.org/dc/terms"
          xmlns:gd="http://schemas.google.com/g/2005">
      <id>http://www.google.com/books/feeds/volumes</id>
      <updated>2008-08-12T23:25:35.000</updated>
    
    <!--  a loot of information here, just removed those nodes to save space.. -->
    
        <dc:creator>Jane Austen</dc:creator>
        <dc:creator>James Kinsley</dc:creator>
        <dc:creator>Fiona Stafford</dc:creator>
        <dc:date>2004</dc:date>
        <dc:description>
          If a truth universally acknowledged can shrink quite so rapidly into 
          the opinion of a somewhat obsessive comic character, the reader may reasonably feel ...
        </dc:description>
        <dc:format>382</dc:format>
        <dc:identifier>8cp-Z_G42g4C</dc:identifier>
        <dc:identifier>ISBN:0192802380</dc:identifier>
        <dc:publisher>Oxford University Press, USA</dc:publisher>
        <dc:subject>Fiction</dc:subject>
        <dc:title>Pride and Prejudice</dc:title>
        <dc:title>A Novel</dc:title>
      </entry>
    </feed>
    

    Of course, this protocol gives you some overhead information, related to this book (like visible or not on Google Books etc.)