I'm trying to use Dropbox as a cloud-based file receptacle for an app/script. The script, written in Python, needs to take PDFs from the Dropbox and use the tika-python wrapper to convert to string.
I'm able to connect to the Dropbox API and use the files_download_to_file()
method to download the PDFs to disk, and then use the tika from_file()
method to pull that download file from the disk to process. Example:
# Download ex.pdf to local disk
dbx.files_download_to_file('/my_local_path/ex_on_disk.pdf', '/my_dropbox_path/ex.pdf')
from tika import parser
parsed = parser.from_file('ex_on_disk.pdf')
The problem is that I'm planning on running this app on something like Heroku. I don't think I'm able to save anything locally and then access it again. I'm not sure how to get something from the Dropbox API that can be directly referenced by the tika wrapper to run the same as above. I think the PHP SDK has a file_get_contents
and a file_put_contents
set of methods but it doesn't appear to have a companion in the Python SDK.
I've tried using the shareable links in place of a filename but that hasn't worked. Any ideas? I know there's also the files_download
method which downloads the FileMetadata
object but I have no idea what to do with this and am having trouble finding more about it.
TLDR; How can I reference a file on Dropbox with a filename string such as 'example.pdf' to be used in another function that is trying to read a file from disk, without saving that Dropbox file to disk?
I figured it out. I used the files_download
method to get the byte string and then use the from_buffer
method of tika instead:
md, response = dbx.files_download(path)
file_contents = response.content
parsed = parser.from_buffer(file_contents)