Search code examples
pythonpdfgoogle-drive-apigoogle-docsgoogle-docs-api

How do I convert Google Docs file (Template for Document) Stored on Google Drive to PDF and download it?


Got stuck on this for few hours now. I have a template that I can edit with python. The idea is to copy, edit, convert, download and then delete the file from drive leaving only the empty template. I've read through the documentation and tried different methods but I can't figure it out.

Edit: my code so far

SCOPES = ['https://www.googleapis.com/auth/drive']


DOCUMENT_ID = '1ZgaYCra-7m_oWIBWK9RoMssYNTAsQLa1ELI1vyBB73c'


def main():
    creds = None
    # The file token.pickle stores the user's access and refresh tokens, and is
    # created automatically when the authorization flow completes for the first
    # time.
    if os.path.exists('token.pickle'):
        with open('token.pickle', 'rb') as token:
            creds = pickle.load(token)
    # If there are no (valid) credentials available, let the user log in.
    if not creds or not creds.valid:
        if creds and creds.expired and creds.refresh_token:
            creds.refresh(Request())
        else:
            flow = InstalledAppFlow.from_client_secrets_file(
            'credentials.json', SCOPES)
            creds = flow.run_local_server(port=0)
        # Save the credentials for the next run
        with open('token.pickle', 'wb') as token:
            pickle.dump(creds, token)

     service = build('docs', 'v1', credentials=creds)

    text1 = 'test'

requests1 = [
    {
        'insertText': {
            'location': {
                'index': 110,
            },
            'text': text1
        }
    },
    {
        'insertText': {
            'location': {
                'index': 98,
            },
            'text': text1
        }
    },
    {
        'insertText': {
            'location': {
                'index': 83,
            },
            'text': text1
        }
    },
    {
        'insertText': {
            'location': {
                'index': 72,
            },
            'text': text1
        }
    },
    {
        'insertText': {
            'location': {
                'index': 49,
            },
            'text': text1
        }
    },
]

result = service.documents().batchUpdate(
    documentId=DOCUMENT_ID, body={'requests': requests1}).execute()

if name == 'main': main()


Solution

  • I believe your current situation and your goal as follows.

    • You have a Google Document as a template document.
    • You want to achieve the following flow using googleapis for python.
      1. Copy template Document.
      2. Update copied Document.
      3. Download the updated Document as PDF file.
      4. Delete the copied Document.

    Modification points:

    • Your script updates the existing Google Document. So in order to achieve your goal, it is required to prepare other flow.
    • In order to copy the template Document, download the Document as PDF file and delete the Document, Drive API is used as follows. And, when the Document is updated, Docs API is used.
      1. Copy template Document.
        • In this case, Drive API is used.
      2. Update copied Document.
        • In this case, Docs API is used and the script is used from yoru script.
      3. Download the updated Document as PDF file.
        • In this case, Drive API is used.
      4. Delete the copied Document.
        • In this case, Drive API is used.

    When your script is modified, it becomes as follows.

    Modified script:

    In this modified script, it supposes that creds of credentials=creds is used from your script. And, before you use this script, please set the variables.

    templateDocumentId = '###' # Please set the Document ID.
    outputPDFFilename = 'sample.pdf' # Please set the output PDF filename.
    
    drive = build('drive', 'v3', credentials=creds)
    docs = build('docs', 'v1', credentials=creds)
    
    # 1. Copy template Document.
    copiedDoc = drive.files().copy(fileId=templateDocumentId, body={'name': 'copiedTemplateDocument'}).execute()
    copiedDocId = copiedDoc.get('id')
    print('Done: 1. Copy template Document.')
    
    # 2. Update copied Document.
    text1 = 'test'
    requests1 = [
        {
            'insertText': {
                'location': {
                    'index': 110,
                },
                'text': text1
            }
        },
        {
            'insertText': {
                'location': {
                    'index': 98,
                },
                'text': text1
            }
        },
        {
            'insertText': {
                'location': {
                    'index': 83,
                },
                'text': text1
            }
        },
        {
            'insertText': {
                'location': {
                    'index': 72,
                },
                'text': text1
            }
        },
        {
            'insertText': {
                'location': {
                    'index': 49,
                },
                'text': text1
            }
        },
    ]
    result = docs.documents().batchUpdate(documentId=copiedDocId, body={'requests': requests1}).execute()
    print('Done: 2. Update copied Document.')
    
    # 3. Download the updated Document as PDF file.
    request = drive.files().export_media(fileId=copiedDocId, mimeType='application/pdf')
    fh = io.FileIO(outputPDFFilename, mode='wb')
    downloader = MediaIoBaseDownload(fh, request)
    done = False
    while done is False:
        status, done = downloader.next_chunk()
        print('Download %d%%.' % int(status.progress() * 100))
    print('Done: 3. Download the updated Document as PDF file.')
    
    # 4. Delete the copied Document.
    drive.files().delete(fileId=copiedDocId).execute()
    print('Done: 4. Delete the copied Document.')
    

    Note:

    • In order to download the file, import io and from googleapiclient.http import MediaIoBaseDownload are also used.
    • In this answer, it supposes that your request body of requests1 for service.documents().batchUpdate(documentId=DOCUMENT_ID, body={'requests': requests1}).execute() works fine as you expected. So please be careful this.

    References: