I am attempting to read the raw text/content of a Google Doc (just a plain document, not a spreadsheet or presentation) from within a Python script, but so far have had little success.
Here's what I've tried:
import gdata.docs.service
client = gdata.docs.service.DocsService()
client.ClientLogin('email', 'password')
q = gdata.docs.service.DocumentQuery()
q.AddNamedFolder('email', 'Folder Name')
feed = client.Query(q.ToUri())
doc = feed.entry[0] # extract one of the documents
However, this variable doc, which is of type gdata.docs.DocumentListEntry, doesn't seem to contain any content, just meta information about the document.
Am I doing something wrong here? Can somebody point me in the right direction? Thank you!
A DocumentQuery
doesn't return you all the documents with their contents—that would take forever. It just returns a list of documents, with metadata about each. (Actually, IIRC you can get a preview page this way, so if your document is only one page that might be enough…)
You then need to download the content in a separate request. The content
element has a type
(the MIME type) and a src
(the URL to the actual data). You can just download that src
, and parse it. However, you can override the default type by adding an exportFormat
parameter, so you don't need to do any parsing.
See the section Downloading documents and files in the docs, which has an example showing how to download a document and specify a format. (It's in .NET rather than Python, and it uses HTML rather than plain text, but you should be able to figure it out.)