Search code examples
pythonweb-scrapinggoose

How do I get the author of an article using python-goose


I'm trying to scrape articles from news agencies, but I can't figure out how to get the author of an article using python-goose. I've read through the documentation, source code and searched google.

from goose import Goose

def getArticle(url):
    g = Goose()
    article = g.extract(url=url)
    print article.title
    # print article.author
    # print article.writer

So, is there a built in way to extract the author of an article using python-goose?

Link for python-goose code and documenation: http://github.com/grangier/python-goose


Solution

  • From their documentation:

    Goose will try to extract the following information:

    • Main text of an article
    • Main image of article
    • Any Youtube/Vimeo movies embedded in article
    • Meta Description
    • Meta tags

    They don't promise to get the author; you will need to look into the metadata to see if it's included and extract it manually.