Search code examples
sharepointoffice365databricksfpdfoffice365api

How to send a pdf object from Databricks to Sharepoint?


INTRO: I have a Databricks notebook where I create a pdf file based on some data. In order to generate the file I am using the fpdf library:

from fpdf import FPDF, HTMLMixin

Thanks to the library I generate a pdf file which is of type: <__main__.HTML2PDF at 0x7f3b73720fd0>. My goal now is to send this pdf to a sharepoint folder. To do so I am using the following lines of code:

from office365.runtime.auth.user_credential import UserCredential
from office365.sharepoint.client_context import ClientContext
from office365.sharepoint.files.file import File

# paths
sharepoint_site = "MySharepointSite" 
sharepoint_folder = "Shared Documents/General/PDFs/" 
sharepoint_user = "aaa@bbb.onmicrosoft.com" 
sharepoint_user_pw = "xyz" 
sharepoint_folder = sharepoint_folder.strip("/")

# set environment variables
SITE_URL = f"https://sharepoint.com/sites/{sharepoint_site}"
RELATIVE_URL = f"/sites/{sharepoint_site}/{sharepoint_folder}"

# connect to sharepoint
ctx = ClientContext(SITE_URL).with_credentials(UserCredential(sharepoint_user, sharepoint_user_pw))
web = ctx.web
ctx.load(web).execute_query()

# Generate PDF
pdf = generate_pdf(ctx, row['ServerRelativeUrl'])

# HERE IS MY ISSUE!
ctx.web.get_folder_by_server_relative_url(sharepoint_folder).upload_file('test.pdf', pdf).execute_query()

PROBLEM: When I reach the last row I get the following error message:

TypeError: Object of type HTML2PDF is not JSON serializable

I believe that pdf objects cannot be serialized to be JSON and therefore I am stuck and I do not know how to send the PDF to the sharepoint.

QUESTION: Would you be able to suggest a smart and elegant way to achieve my goal i.e sending the pdf file to the sharepoint please?


Solution

  • I was able to solve this problem by saving the pdf as a string, then encoding it and finally pushing it to the sharepoint:

    pdf_binary = pdf.output(dest='S').encode("latin1")
    ctx.web.get_folder_by_server_relative_url(sharepoint_folder).upload_file("test.pdf", pdf_binary).execute_query()
    

    Note: If it does not work, try to change the encoding type.