Search code examples
pythondownloadgoogle-docsgoogle-docs-api

Using Python, how can I read plain text from a Google Doc?


I am attempting to read the raw text/content of a Google Doc (just a plain document, not a spreadsheet or presentation) from within a Python script, but so far have had little success.

Here's what I've tried:

import gdata.docs.service
client = gdata.docs.service.DocsService()
client.ClientLogin('email', 'password')
q = gdata.docs.service.DocumentQuery()
q.AddNamedFolder('email', 'Folder Name')
feed = client.Query(q.ToUri())
doc = feed.entry[0] # extract one of the documents

However, this variable doc, which is of type gdata.docs.DocumentListEntry, doesn't seem to contain any content, just meta information about the document.

Am I doing something wrong here? Can somebody point me in the right direction? Thank you!


Solution

  • A DocumentQuery doesn't return you all the documents with their contents—that would take forever. It just returns a list of documents, with metadata about each. (Actually, IIRC you can get a preview page this way, so if your document is only one page that might be enough…)

    You then need to download the content in a separate request. The content element has a type (the MIME type) and a src (the URL to the actual data). You can just download that src, and parse it. However, you can override the default type by adding an exportFormat parameter, so you don't need to do any parsing.

    See the section Downloading documents and files in the docs, which has an example showing how to download a document and specify a format. (It's in .NET rather than Python, and it uses HTML rather than plain text, but you should be able to figure it out.)