Search code examples
google-drive-api

Keep original image file name when exporting google doc to html


I'm building an app that exports google docs to websites but I can't get original file names.

Steps:

  1. Create a google doc, insert a drive image through Insert -> Image -> Drive.
  2. Use google drive api to export the doc to html (Ruby: https://googleapis.dev/ruby/google-api-client/latest/Google/Apis/DriveV3/DriveService.html#export_file-instance_method)
  3. Extract the images from the html, you'll always get srcs similar to https://lh3.googleusercontent.com/zUmjDlO9wBwiEMnegKwkh1VPGUaaVssRmWn6BvN_-WyD8ImK-s8rgwVkjmR1Zrsd89OcelYKArsHxy9CUXREoeUm5LgfxrUU0HZVa7d7BqcUsDh5E19I4AqwX_xIv_0Tyf5b4qZm
  4. Download the image as you'd do for any file in the web. The "content-disposition" header always have "filename=Untitled.jpg" regardless of the original file name.

Anything I'm doing wrong? Is there a way to get the original file name?


Solution

  • Issue is that Google doc doesn't save any details from the image aside from the details below:

    enter image description here

    Thus when opened in another application, that app initializes the images' names into its own default image names (e.g image.jpg, Untitled.jng, etc.) as they didn't see any details about it. See a similar post.

    Summary:

    • In short, you can't get the original filename by checking the details of the inserted image as Google docs doesn't store that data when you inserted it. Even before exporting, there is no way to determine the name of the inserted image in the document.

    Workaround:

    • You could add a caption together with the image that contains a specific string (Figure <N>: <filename>) so that you could easily find them when extracting the images. Then when checking which caption is for a specific image, you just have to find the match of the specific string and the order of the image. This is definitely not the optimal approach but the simplest one to do and follow.